I want to have universal way of detecting specific CPU features. For this task I've created this function which takes EAX leaf number,Register name and bit number and returns true or false. It works fine for MMX/SSEx/AVX (EAX=1) but it does not detect AVX2 (EAX=7).
CPU: i5-4670k
OS: Windows 7
DetectCPUFeature('1','EDX',23) //DETECTS MMX CORRECTLY
DetectCPUFeature('1','EDX',25) //DETECTS SSE CORRECTLY
DetectCPUFeature('1','EDX',26) //DETECTS SSE2 CORRECTLY
DetectCPUFeature('1','ECX',0) //DETECTS SSE3 CORRECTLY
DetectCPUFeature('1','ECX',9) //DETECTS SSSE3 CORRECTLY
DetectCPUFeature('1','ECX',19) //DETECTS SSE4.1 CORRECTLY
DetectCPUFeature('1','ECX',20) //DETECTS SSE4.2 CORRECTLY
DetectCPUFeature('1','ECX',28) //DETECTS AVX CORRECTLY
DetectCPUFeature('7','EBX',5) //DOES NOT DETECT AVX2!
.
function DetectCPUFeature(EAX_Leaf_HEX,Register_Name:string;Bit:byte):boolean;
var _eax,_ebx,_ecx,_edx,EAX_Leaf,_Result: Longword;
x:integer;
Binary_mask:string;
Decimal_mask:int64;
begin
EAX_Leaf:=HexToInt(EAX_Leaf_HEX);
Binary_mask:='1';
for x:=1 to Bit do Binary_mask:=Binary_mask+'0';
Decimal_mask:=BinToInt(Binary_mask);
if AnsiUpperCase(Register_Name)='EDX' then
begin
asm
mov eax,EAX_Leaf // https://en.wikipedia.org/wiki/CPUID
db $0F,$A2 // db $0F,$A2 = CPUID instruction
mov _Result,edx
end;
end;
if AnsiUpperCase(Register_Name)='ECX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ecx
end;
end;
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
end;
end;
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
end;
This sort of code is very dubious, mixing asm with Pascal code. Your code, in the asm blocks modifies registers and fails to restore them. That could easily be conflicting with the compiler's register usage. My strong advice to you is that you should never mix asm and Pascal in this way. Always use pure Pascal or pure asm.
What you need is a function that will perform the CPUID instruction and return you all the registers in a structure. You can then pick out what you want from that using Pascal code.
In addition, as #J... points out, you need to specify the sub-leaf value in the ECX register before invoking the CPUID instruction. That is a requirement for a number of the more recently added CPUID arguments.
This is the function you need:
type
TCPUID = record
EAX: Cardinal;
EBX: Cardinal;
ECX: Cardinal;
EDX: Cardinal;
end;
function GetCPUID(Leaf, Subleaf: Cardinal): TCPUID;
asm
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
end;
I've written this for 32 bit code, but if you need to support 64 bit code also that support is easy enough to add.
function GetCPUID(Leaf, Subleaf: Integer): TCPUID;
asm
{$IF Defined(CPUX86)}
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
{$ELSEIF Defined(CPUX64)}
mov r9,rcx
mov ecx,r8d
mov r8,rbx
mov eax,edx
cpuid
mov [r9+$0], eax
mov [r9+$4], ebx
mov [r9+$8], ecx
mov [r9+$c], edx
mov rbx, r8
{$ELSE}
{$Message Fatal 'GetCPUID has not been implemented for this architecture.'}
{$IFEND}
end;
With this at hand you can call CPUID passing any value as input, and retrieve all 4 registers of output, with which you can then do whatever you please.
Your code to create a bitmask is extremely inefficient and very far from idiomatic. Use 1 shl N to create a value with a single bit set, in position N.
Code like this:
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
is also some way from idiomatic. That would normally be written like this:
DetectCPUFeature := value and mask <> 0;
You might end up with a wrapper function that looks like this:
type
TCPUIDRegister = (regEAX, regEBX, regECX, regEDX);
function GetCPUIDRegister(CPUID: TCPUID; Reg: TCPUIDRegister): Cardinal;
begin
case Reg of
regEAX:
Result := CPUID.EAX;
regEBX:
Result := CPUID.EBX;
regECX:
Result := CPUID.ECX;
regEDX:
Result := CPUID.EDX;
end;
end;
function CPUFeatureEnabled(Leaf, Subleaf: Cardinal; Reg: TCPUIDRegister; Bit: Integer): Boolean;
var
value: Cardinal;
begin
value := GetCPUIDRegister(GetCPUID(Leaf, Subleaf), Reg);
Result := value and (1 shl Bit) <> 0;
end;
While David's answer is excellent, the reason the function fails is that the ECX register is not set to zero (required for fetching extended info in the CPUID call).
See : How to detect New Instruction support in the 4th generation Intel® Core™ processor family
where AVX2 is found by (emphasis mine)
CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
The following correctly returns the extended information and identifies AVX2 support.
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
push ecx { push ecx to stack}
mov ecx, 0 { set ecx to zero}
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
pop ecx { restore ecx}
end;
The other asm functions have the same error as ECX is required to be zero for those calls also.
taken from Synopse Informatique:
type
/// the potential features, retrieved from an Intel CPU
// - see https://en.wikipedia.org/wiki/CPUID#EAX.3D1:_Processor_Info_and_Feature_Bits
TALIntelCpuFeature =
( { in EDX }
cfFPU, cfVME, cfDE, cfPSE, cfTSC, cfMSR, cfPAE, cfMCE,
cfCX8, cfAPIC, cf_d10, cfSEP, cfMTRR, cfPGE, cfMCA, cfCMOV,
cfPAT, cfPSE36, cfPSN, cfCLFSH, cf_d20, cfDS, cfACPI, cfMMX,
cfFXSR, cfSSE, cfSSE2, cfSS, cfHTT, cfTM, cfIA64, cfPBE,
{ in ECX }
cfSSE3, cfCLMUL, cfDS64, cfMON, cfDSCPL, cfVMX, cfSMX, cfEST,
cfTM2, cfSSSE3, cfCID, cfSDBG, cfFMA, cfCX16, cfXTPR, cfPDCM,
cf_c16, cfPCID, cfDCA, cfSSE41, cfSSE42, cfX2A, cfMOVBE, cfPOPCNT,
cfTSC2, cfAESNI, cfXS, cfOSXS, cfAVX, cfF16C, cfRAND, cfHYP,
{ extended features in EBX, ECX }
cfFSGS, cf_b01, cfSGX, cfBMI1, cfHLE, cfAVX2, cf_b06, cfSMEP, cfBMI2,
cfERMS, cfINVPCID, cfRTM, cfPQM, cf_b13, cfMPX, cfPQE, cfAVX512F,
cfAVX512DQ, cfRDSEED, cfADX, cfSMAP, cfAVX512IFMA, cfPCOMMIT,
cfCLFLUSH, cfCLWB, cfIPT, cfAVX512PF, cfAVX512ER, cfAVX512CD,
cfSHA, cfAVX512BW, cfAVX512VL, cfPREFW1, cfAVX512VBMI);
/// all features, as retrieved from an Intel CPU
TALIntelCpuFeatures = set of TALIntelCpuFeature;
var
/// the available CPU features, as recognized at program startup
ALCpuFeatures: TALIntelCpuFeatures;
{**}
type
_TRegisters = record
eax,ebx,ecx,edx: cardinal;
end;
{***************************************************************}
procedure _GetCPUID(Param: Cardinal; var Registers: _TRegisters);
{$IF defined(CPU64BITS)}
asm // ecx=param, rdx=Registers (Linux: edi,rsi)
.NOFRAME
mov eax, ecx
mov r9, rdx
mov r10, rbx // preserve rbx
xor ebx, ebx
xor ecx, ecx
xor edx, edx
cpuid
mov _TRegisters(r9).&eax, eax
mov _TRegisters(r9).&ebx, ebx
mov _TRegisters(r9).&ecx, ecx
mov _TRegisters(r9).&edx, edx
mov rbx, r10
end;
{$else}
asm
push esi
push edi
mov esi, edx
mov edi, eax
pushfd
pop eax
mov edx, eax
xor eax, $200000
push eax
popfd
pushfd
pop eax
xor eax, edx
jz #nocpuid
push ebx
mov eax, edi
xor ecx, ecx
cpuid
mov _TRegisters(esi).&eax, eax
mov _TRegisters(esi).&ebx, ebx
mov _TRegisters(esi).&ecx, ecx
mov _TRegisters(esi).&edx, edx
pop ebx
#nocpuid:
pop edi
pop esi
end;
{$ifend}
{******************************}
procedure _TestIntelCpuFeatures;
var regs: _TRegisters;
begin
regs.edx := 0;
regs.ecx := 0;
_GetCPUID(1,regs);
PIntegerArray(#ALCpuFeatures)^[0] := regs.edx;
PIntegerArray(#ALCpuFeatures)^[1] := regs.ecx;
_GetCPUID(7,regs);
PIntegerArray(#ALCpuFeatures)^[2] := regs.ebx;
PByteArray(#ALCpuFeatures)^[12] := regs.ecx;
end;
initialization
_TestIntelCpuFeatures;
I use this functions since D2007 I got it online, don't remember where.
But now in XE7 it return a compilation error:
"E2107 Operand size mismatch"
function FastCharPos(const aSource : string; const C: Char; StartPos : Integer) : Integer;
var
L : Integer;
begin
//If this assert failed, it is because you passed 0 for StartPos, lowest value is 1 !!
Assert(StartPos > 0);
Result := 0;
L := Length(aSource);
if L = 0 then exit;
if StartPos > L then exit;
Dec(StartPos);
asm
PUSH EDI //Preserve this register
mov EDI, aSource //Point EDI at aSource
add EDI, StartPos
mov ECX, L //Make a note of how many chars to search through
sub ECX, StartPos
mov AL, C //and which char we want :Error -"E2107 Operand size mismatch"
#Loop:
cmp Al, [EDI] //compare it against the SourceString
jz #Found
inc EDI
dec ECX
jnz #Loop
jmp #NotFound
#Found:
sub EDI, aSource //EDI has been incremented, so EDI-OrigAdress = Char pos !
inc EDI
mov Result, EDI
#NotFound:
POP EDI
end;
end;
function FastCharPosNoCase(const aSource : string; C: Char; StartPos : Integer) : Integer;
var
L : Integer;
begin
Result := 0;
L := Length(aSource);
if L = 0 then exit;
if StartPos > L then exit;
Dec(StartPos);
if StartPos < 0 then StartPos := 0;
asm
PUSH EDI //Preserve this register
PUSH EBX
mov EDX, GUpcaseLUT
mov EDI, aSource //Point EDI at aSource
add EDI, StartPos
mov ECX, L //Make a note of how many chars to search through
sub ECX, StartPos
xor EBX, EBX
mov BL, C //:Error -"E2107 Operand size mismatch"
mov AL, [EDX+EBX]
#Loop:
mov BL, [EDI]
inc EDI
cmp Al, [EDX+EBX]
jz #Found
dec ECX
jnz #Loop
jmp #NotFound
#Found:
sub EDI, aSource //EDI has been incremented, so EDI-OrigAdress = Char pos !
mov Result, EDI
#NotFound:
POP EBX
POP EDI
end;
end;
What do I need to update these two functions to XE7 win32?
What must I do?
Thanks.
This code was written for pre Unicode Delphi where Char is an alias for AnsiChar, the 8 bit character type. In Delphi 2009 and later, Char is an alias for WideChar the 16 bit character type.
The reason for the error message is that the code is intended to operate on 8 bit character elements, but you are providing 16 bit operands. The operator expects 8 bit operands, but you supplied 16 bit operands.
Change Char to AnsiChar to make this code compile and behave as intended on all versions of Delphi.
Having said that, I suggest you stop using this code. Instead use Pos. As a rule, it is preferable to use built-in library functions.
You should stop using old assembler version for string routines and use the use built-in library functions.
If you want to move on in a hurry you can reimplement you functions like this:
function FastCharPos(const aSource: string; const C: Char; StartPos: Integer): Integer; inline;
begin
Result := Pos(C, aSource, StartPos);
end;
function FastCharPosNoCase(const aSource: string; C: Char; StartPos: Integer): Integer; inline;
begin
Result := Pos(AnsiUppercase(C), AnsiUppercase(aSource), StartPos);
end;
I have the following function that works in Delphi 2006, but under Delphi XE2 it gives either an access violation error or a privileged instruction error when processing RET.
function Q_TrimChar(const S: string; Ch: Char): string;
asm
PUSH ESI
MOV ESI,ECX
TEST EAX,EAX
JE ##qt
MOV ECX,[EAX-4]
TEST ECX,ECX
JE ##qt
PUSH EBX
PUSH EDI
MOV EBX,EAX
MOV EDI,EDX
XOR EDX,EDX
MOV EAX,ESI
CALL System.#LStrFromPCharLen
MOV EDX,EDI
MOV ECX,[EBX-4]
##lp1: CMP DL,BYTE PTR [EBX]
JNE ##ex1
INC EBX
DEC ECX
JNE ##lp1
MOV EDX,[ESI]
JMP ##wq
##ex1: DEC ECX
##lp2: CMP DL,BYTE PTR [EBX+ECX]
JNE ##ex2
DEC ECX
JMP ##lp2
##ex2: MOV EDI,[ESI]
LEA EDX,[EDI+ECX+1]
##lp3: MOV AL,BYTE PTR [EBX+ECX]
MOV BYTE PTR [EDI+ECX],AL
DEC ECX
JNS ##lp3
##wq: MOV EAX,[ESI]
MOV BYTE PTR [EDX],0
SUB EDX,EAX
MOV [EAX-4],EDX
POP EDI
POP EBX
POP ESI
RET
##qt: MOV EAX,ESI
CALL System.#LStrClr
POP ESI
end;
I don't know assembly very well. What is the problem?
I completely agree with David's suggestion to simply code this in Pascal and have upvoted that answer. Unless profiling has indicated that this is a true bottleneck then there's really no need for the ASM. Here are two versions. The first is easier to read but the second is more efficient:
function Q_TrimChar(const S: string; Ch: Char): string;
begin
result := S;
while (result <> '') and (result[1] = Ch) do Delete(Result, 1, 1);
while (result <> '') and (result[Length(Result)] = Ch) do Delete(Result, Length(Result), 1);
end;
function Q_TrimChar(const S: string; Ch: Char): string;
var
First, Last : integer;
begin
First := 1;
Last := Length(S);
while (First < Last) and (S[First] = Ch) do inc(First);
while (Last >= First) and (S[Last] = Ch) do Dec(Last);
Result := copy(S, First, Last-First+1);
end;
Delphi 2006 uses single byte ANSI characters and so string is AnsiString, Char is AnsiChar. On Delphi 2009 and later, two byte Unicode characters are used. This function cannot possibly work on both compilers.
Even the standard hack of using AnsiString and AnsiChar does not work. Most likely the assumptions that this function makes about the RTL implementation are no longer valid in modern Delphi.
I would re-write this function in Pascal and let the compiler do the work. Not only will that be the quickest way to solve your current problem, it will also get you over the hurdle of 64-bit compilation should you ever choose to tackle that.
Can I rely on the fact, that an interface field in a record is always initialized to nil?
TMyRec = record
FGuard : IInterface;
FObject : TObject;
procedure CheckCreated;
end;
This would allow me to write:
procedure TMyCheck.CheckCreated;
begin
if (FGuard = nil) then
begin
FObject := TObject.Create;
FGuard := TGuard.Create (FObject);
end;
end;
(for automatic lifetime management)
I know that interface fields are initialized to nil but is that also true when contained in a record?
Yes you can rely on that.
All reference-counted variables:
Strings;
Dynamic arrays;
Variants;
Interfaces;
Nested records containing those kind of variables.
are initialized to nil when a record is allocated, if you use New or a dynamic array - even locally on the stack. Of course, if you use a plain GetMem or work with pointers, you'll have to initialize it by yourself (e.g. using a FillChar).
If you are curious, there is an hidden call to the following procedure of System.pas:
procedure _InitializeRecord(p: Pointer; typeInfo: Pointer);
This will fill all the reference-counted variables memory to 0, but won't set the other members of the record. In fact, in a class instance, the whole field memory is initialized with 0, including all members - for a record, the initialization is only for reference-counted types.
Note that in some cases, I've found out that this initialization was not properly generated if you use the object type instead of record - at least under Delphi 2009-2010. So if your code has some object type declaration, you may better switch to record (and loose inheritance), or explicitly call FillChar.
If you are curious, here is an optimized version I wrote in asm - available in our enhanced RTL.
procedure _InitializeRecord(p: Pointer; typeInfo: Pointer);
// this procedure is called at most object creation -> optimization rocks here!
asm
{ -> EAX pointer to record to be initialized }
{ EDX pointer to type info }
MOVZX ECX,[EDX+1] { type name length }
PUSH EBX
PUSH ESI
PUSH EDI
MOV EBX,EAX // PIC safe. See comment above
LEA ESI,[EDX+ECX+2+8] { address of destructable fields }
MOV EDI,[EDX+ECX+2+4] { number of destructable fields }
##loop:
mov edx,[esi] // type info
mov eax,[esi+4]
mov edx,[edx]
add esi,8
add eax,ebx // data to be initialized
movzx ecx,[edx] // data type
cmp ecx,tkLString
je ##LString
jb ##err
cmp ecx,tkDynArray
je ##DynArray
ja ##err
jmp dword ptr [ecx*4+##Tab-tkWString*4]
nop; nop; nop // align ##Tab
##Tab: dd ##WString,##Variant,##Array,##Record
dd ##Interface,##err
##LString:
##WString:
##Interface:
##DynArray: // zero 4 bytes in EAX
dec edi
mov dword ptr [eax],0
jg ##loop
POP EDI
POP ESI
POP EBX
RET
##Variant: // zero 16 bytes in EAX
xor ecx,ecx
dec edi
mov [eax],ecx
mov [eax+4],ecx
mov [eax+8],ecx
mov [eax+12],ecx
jg ##loop
jmp ##exit
##err:
MOV AL,reInvalidPtr
POP EDI
POP ESI
POP EBX
JMP Error
##Array:
##Record: // rarely called in practice
mov ecx,1
call _InitializeArray
dec edi
jg ##loop
##exit:
POP EDI
POP ESI
POP EBX
end;
In Delphi, the declaration of the DivMod function is
procedure DivMod(Dividend: Cardinal; Divisor: Word;
var Result, Remainder: Word);
Thus, the divisor, result, and remainder cannot be grater than 65535, a rather severe limitation. Why is this? Why couldn't the delcaration be
procedure DivMod(Dividend: Cardinal; Divisor: Cardinal;
var Result, Remainder: Cardinal);
The procedure is implemented using assembly, and is therefore probably extremely fast. Would it not be possible for the code
PUSH EBX
MOV EBX,EDX
MOV EDX,EAX
SHR EDX,16
DIV BX
MOV EBX,Remainder
MOV [ECX],AX
MOV [EBX],DX
POP EBX
to be adapted to cardinals? How much slower is the naïve attempt
procedure DivModInt(const Dividend: integer; const Divisor: integer; out result: integer; out remainder: integer);
begin
result := Dividend div Divisor;
remainder := Dividend mod Divisor;
end;
that is not (?) limited to 16-bit integers?
Such a procedure is possible. I have not tested the code enough, but I think it's OK:
procedure DivMod32(Dividend, Divisor: Cardinal; var Quotient, Remainder: Cardinal);
asm
PUSH EBX
MOV EBX,EDX
XOR EDX,EDX
DIV EBX
MOV [ECX],EAX
MOV EBX,Remainder
MOV [EBX],EDX
POP EBX
end;
Updated:
even more efficient:
function DivMod32(Dividend, Divisor: Cardinal; var Remainder: Cardinal): Cardinal;
asm
PUSH EBX
MOV EBX,EDX
XOR EDX,EDX
DIV EBX
MOV [ECX],EDX
POP EBX
end;
Updated 2:
You can see the assembly code generated by Delphi compiler in the Disassembly (or CPU) window. Eg, the procedure
procedure DivMod32(const Dividend: Cardinal; const Divisor: Cardinal;
out result: Cardinal; out remainder: Cardinal);
begin
result := Dividend div Divisor;
remainder := Dividend mod Divisor;
end;
generates code
Unit1.pas.28: begin
0046CC94 55 push ebp
0046CC95 8BEC mov ebp,esp
0046CC97 53 push ebx
0046CC98 56 push esi
0046CC99 8BF2 mov esi,edx
0046CC9B 8BD8 mov ebx,eax
Unit1.pas.29: result := Dividend div Divisor;
0046CC9D 8BC3 mov eax,ebx
0046CC9F 33D2 xor edx,edx
0046CCA1 F7F6 div esi
0046CCA3 8901 mov [ecx],eax
Unit1.pas.30: remainder := Dividend mod Divisor;
0046CCA5 8BC3 mov eax,ebx
0046CCA7 33D2 xor edx,edx
0046CCA9 F7F6 div esi
0046CCAB 8B4508 mov eax,[ebp+$08]
0046CCAE 8910 mov [eax],edx
Unit1.pas.31: end;
0046CCB0 5E pop esi
0046CCB1 5B pop ebx
0046CCB2 5D pop ebp
0046CCB3 C20400 ret $0004
This code is linear (contains no jumps) and modern processors (with long instruction pipeline) are very efficient in executing linear code. So though my DivMode32 implementation is about 3 times shorter, 60% is a reasonable estimate.