Code detects MMX/SSE/AVX but not AVX2 - delphi

I want to have universal way of detecting specific CPU features. For this task I've created this function which takes EAX leaf number,Register name and bit number and returns true or false. It works fine for MMX/SSEx/AVX (EAX=1) but it does not detect AVX2 (EAX=7).
CPU: i5-4670k
OS: Windows 7
DetectCPUFeature('1','EDX',23) //DETECTS MMX CORRECTLY
DetectCPUFeature('1','EDX',25) //DETECTS SSE CORRECTLY
DetectCPUFeature('1','EDX',26) //DETECTS SSE2 CORRECTLY
DetectCPUFeature('1','ECX',0) //DETECTS SSE3 CORRECTLY
DetectCPUFeature('1','ECX',9) //DETECTS SSSE3 CORRECTLY
DetectCPUFeature('1','ECX',19) //DETECTS SSE4.1 CORRECTLY
DetectCPUFeature('1','ECX',20) //DETECTS SSE4.2 CORRECTLY
DetectCPUFeature('1','ECX',28) //DETECTS AVX CORRECTLY
DetectCPUFeature('7','EBX',5) //DOES NOT DETECT AVX2!
.
function DetectCPUFeature(EAX_Leaf_HEX,Register_Name:string;Bit:byte):boolean;
var _eax,_ebx,_ecx,_edx,EAX_Leaf,_Result: Longword;
x:integer;
Binary_mask:string;
Decimal_mask:int64;
begin
EAX_Leaf:=HexToInt(EAX_Leaf_HEX);
Binary_mask:='1';
for x:=1 to Bit do Binary_mask:=Binary_mask+'0';
Decimal_mask:=BinToInt(Binary_mask);
if AnsiUpperCase(Register_Name)='EDX' then
begin
asm
mov eax,EAX_Leaf // https://en.wikipedia.org/wiki/CPUID
db $0F,$A2 // db $0F,$A2 = CPUID instruction
mov _Result,edx
end;
end;
if AnsiUpperCase(Register_Name)='ECX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ecx
end;
end;
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
end;
end;
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
end;

This sort of code is very dubious, mixing asm with Pascal code. Your code, in the asm blocks modifies registers and fails to restore them. That could easily be conflicting with the compiler's register usage. My strong advice to you is that you should never mix asm and Pascal in this way. Always use pure Pascal or pure asm.
What you need is a function that will perform the CPUID instruction and return you all the registers in a structure. You can then pick out what you want from that using Pascal code.
In addition, as #J... points out, you need to specify the sub-leaf value in the ECX register before invoking the CPUID instruction. That is a requirement for a number of the more recently added CPUID arguments.
This is the function you need:
type
TCPUID = record
EAX: Cardinal;
EBX: Cardinal;
ECX: Cardinal;
EDX: Cardinal;
end;
function GetCPUID(Leaf, Subleaf: Cardinal): TCPUID;
asm
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
end;
I've written this for 32 bit code, but if you need to support 64 bit code also that support is easy enough to add.
function GetCPUID(Leaf, Subleaf: Integer): TCPUID;
asm
{$IF Defined(CPUX86)}
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
{$ELSEIF Defined(CPUX64)}
mov r9,rcx
mov ecx,r8d
mov r8,rbx
mov eax,edx
cpuid
mov [r9+$0], eax
mov [r9+$4], ebx
mov [r9+$8], ecx
mov [r9+$c], edx
mov rbx, r8
{$ELSE}
{$Message Fatal 'GetCPUID has not been implemented for this architecture.'}
{$IFEND}
end;
With this at hand you can call CPUID passing any value as input, and retrieve all 4 registers of output, with which you can then do whatever you please.
Your code to create a bitmask is extremely inefficient and very far from idiomatic. Use 1 shl N to create a value with a single bit set, in position N.
Code like this:
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
is also some way from idiomatic. That would normally be written like this:
DetectCPUFeature := value and mask <> 0;
You might end up with a wrapper function that looks like this:
type
TCPUIDRegister = (regEAX, regEBX, regECX, regEDX);
function GetCPUIDRegister(CPUID: TCPUID; Reg: TCPUIDRegister): Cardinal;
begin
case Reg of
regEAX:
Result := CPUID.EAX;
regEBX:
Result := CPUID.EBX;
regECX:
Result := CPUID.ECX;
regEDX:
Result := CPUID.EDX;
end;
end;
function CPUFeatureEnabled(Leaf, Subleaf: Cardinal; Reg: TCPUIDRegister; Bit: Integer): Boolean;
var
value: Cardinal;
begin
value := GetCPUIDRegister(GetCPUID(Leaf, Subleaf), Reg);
Result := value and (1 shl Bit) <> 0;
end;

While David's answer is excellent, the reason the function fails is that the ECX register is not set to zero (required for fetching extended info in the CPUID call).
See : How to detect New Instruction support in the 4th generation Intel® Core™ processor family
where AVX2 is found by (emphasis mine)
CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
The following correctly returns the extended information and identifies AVX2 support.
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
push ecx { push ecx to stack}
mov ecx, 0 { set ecx to zero}
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
pop ecx { restore ecx}
end;
The other asm functions have the same error as ECX is required to be zero for those calls also.

taken from Synopse Informatique:
type
/// the potential features, retrieved from an Intel CPU
// - see https://en.wikipedia.org/wiki/CPUID#EAX.3D1:_Processor_Info_and_Feature_Bits
TALIntelCpuFeature =
( { in EDX }
cfFPU, cfVME, cfDE, cfPSE, cfTSC, cfMSR, cfPAE, cfMCE,
cfCX8, cfAPIC, cf_d10, cfSEP, cfMTRR, cfPGE, cfMCA, cfCMOV,
cfPAT, cfPSE36, cfPSN, cfCLFSH, cf_d20, cfDS, cfACPI, cfMMX,
cfFXSR, cfSSE, cfSSE2, cfSS, cfHTT, cfTM, cfIA64, cfPBE,
{ in ECX }
cfSSE3, cfCLMUL, cfDS64, cfMON, cfDSCPL, cfVMX, cfSMX, cfEST,
cfTM2, cfSSSE3, cfCID, cfSDBG, cfFMA, cfCX16, cfXTPR, cfPDCM,
cf_c16, cfPCID, cfDCA, cfSSE41, cfSSE42, cfX2A, cfMOVBE, cfPOPCNT,
cfTSC2, cfAESNI, cfXS, cfOSXS, cfAVX, cfF16C, cfRAND, cfHYP,
{ extended features in EBX, ECX }
cfFSGS, cf_b01, cfSGX, cfBMI1, cfHLE, cfAVX2, cf_b06, cfSMEP, cfBMI2,
cfERMS, cfINVPCID, cfRTM, cfPQM, cf_b13, cfMPX, cfPQE, cfAVX512F,
cfAVX512DQ, cfRDSEED, cfADX, cfSMAP, cfAVX512IFMA, cfPCOMMIT,
cfCLFLUSH, cfCLWB, cfIPT, cfAVX512PF, cfAVX512ER, cfAVX512CD,
cfSHA, cfAVX512BW, cfAVX512VL, cfPREFW1, cfAVX512VBMI);
/// all features, as retrieved from an Intel CPU
TALIntelCpuFeatures = set of TALIntelCpuFeature;
var
/// the available CPU features, as recognized at program startup
ALCpuFeatures: TALIntelCpuFeatures;
{**}
type
_TRegisters = record
eax,ebx,ecx,edx: cardinal;
end;
{***************************************************************}
procedure _GetCPUID(Param: Cardinal; var Registers: _TRegisters);
{$IF defined(CPU64BITS)}
asm // ecx=param, rdx=Registers (Linux: edi,rsi)
.NOFRAME
mov eax, ecx
mov r9, rdx
mov r10, rbx // preserve rbx
xor ebx, ebx
xor ecx, ecx
xor edx, edx
cpuid
mov _TRegisters(r9).&eax, eax
mov _TRegisters(r9).&ebx, ebx
mov _TRegisters(r9).&ecx, ecx
mov _TRegisters(r9).&edx, edx
mov rbx, r10
end;
{$else}
asm
push esi
push edi
mov esi, edx
mov edi, eax
pushfd
pop eax
mov edx, eax
xor eax, $200000
push eax
popfd
pushfd
pop eax
xor eax, edx
jz #nocpuid
push ebx
mov eax, edi
xor ecx, ecx
cpuid
mov _TRegisters(esi).&eax, eax
mov _TRegisters(esi).&ebx, ebx
mov _TRegisters(esi).&ecx, ecx
mov _TRegisters(esi).&edx, edx
pop ebx
#nocpuid:
pop edi
pop esi
end;
{$ifend}
{******************************}
procedure _TestIntelCpuFeatures;
var regs: _TRegisters;
begin
regs.edx := 0;
regs.ecx := 0;
_GetCPUID(1,regs);
PIntegerArray(#ALCpuFeatures)^[0] := regs.edx;
PIntegerArray(#ALCpuFeatures)^[1] := regs.ecx;
_GetCPUID(7,regs);
PIntegerArray(#ALCpuFeatures)^[2] := regs.ebx;
PByteArray(#ALCpuFeatures)^[12] := regs.ecx;
end;
initialization
_TestIntelCpuFeatures;

Related

Converting FindScanline assembly code to purepascal

I am trying to convert some Delphi 5 code to Delphi XE7-x64 and I am stuck on following code:
function FindScanline(Source : Pointer; MaxLen : Cardinal;
Value : Cardinal) : Cardinal; assembler;
asm
PUSH ECX
MOV ECX,EDX
MOV EDX,EDI
MOV EDI,EAX
POP EAX
REPE SCASB
MOV EAX,ECX
MOV EDI,EDX
end;
As far as I understand following things are occuring:
push the contents of ECX register(Value) onto the stack
move contents of EDX register(MaxLen) into ECX register. now ECX holds (MaxLen)
move contents of EDI register into EDX register. now EDX holds (EDI)
move contents of EAX register into EDI register. now EDI holds (Source)
pop ECX into EDX. now EDX holds (Value). Was (EDI) lost?
repeat while equal ?decrement ECX for each char?
move contents of ECX register into EAX register
move contents of EDX register into EDI register
For reference function FindScanline is used in function GetCursorHeightMargin
Any help in translating above will be appreciated.
Here is a literal translation:
function FindScanline(Source: Pointer; MaxLen: Cardinal; Value: Cardinal): Cardinal;
var
Ptr: PByte;
begin
Result := MaxLen;
if Result > 0 then
dec(Result);
Ptr := Source;
while (Result > 0) and (Ptr^ = Value) do
begin
inc(Ptr);
dec(Result);
end;
end;
It's rather messy to handle the edge cases unfortunately.

What does DUnit2's CallerAddr function do, and how do I convert it to 64 bits?

I am trying to get DUnit2 working under 64 bits, but I am stumped to what this method does, let alone how to convert it to 64 bits. Pure Pascal would better, but since it refers to the stack (ebp), it might not be possible.
function CallerAddr: Pointer; assembler;
const
CallerIP = $4;
asm
mov eax, ebp
call IsBadPointer
test eax,eax
jne ##Error
mov eax, [ebp].CallerIP
sub eax, 5 // 5 bytes for call
push eax
call IsBadPointer
test eax,eax
pop eax
je ##Finish
##Error:
xor eax, eax
##Finish:
end;
function RtlCaptureStackBackTrace(FramesToSkip: ULONG; FramesToCapture: ULONG;
out BackTrace: Pointer; BackTraceHash: PULONG): USHORT; stdcall;
external 'kernel32.dll' name 'RtlCaptureStackBackTrace' delayed;
function CallerAddr: Pointer;
begin
// Skip 2 Frames, one for the return of CallerAddr and one for the
// return of RtlCaptureStackBackTrace
if RtlCaptureStackBackTrace(2, 1, Result, nil) > 0 then
begin
if not IsBadPointer(Result) then
Result := Pointer(NativeInt(Result) - 5)
else
Result := nil;
end
else
begin
Result := nil;
end;
end;
function CallerAddr: Pointer; assembler;
const
CallerIP = $4;
asm
mov rax, rcx ;For int.. XMM0 for float
call IsBadPointer
test rax,rax
jne ##Error
mov rax, [rcx].CallerIP
sub rax, 5 // 5 bytes for call
push rax
call IsBadPointer
test rax,rax
pop rax
je ##Finish
##Error:
xor rax, rax
##Finish:
end;

Delphi - Detect Int64 Overflow Error

In Delphi how can I detect overflow errors for Int64?
For Integers we could do:
type
MyInt = Integer; //Int64
function TryMaxTimes10(out Res: MyInt): boolean;
var
a, b: MyInt;
begin
{$Q+}
try
a := High(MyInt);
b := 10;
Res := a * b; //REF1
Result := True;
except
Result := False;
end;
{$Q-}
end;
For MyInt = Integer, line REF1 gives an exception and so TryMaxTimes10 returns false.
But if we change MyInt to MyInt = Int64, then REF1 does not give an exception and TryMaxTimes10 returns true!
I understand that the help for {$Q+} does not specifically mention Int64: ... {$Q+} state, certain integer arithmetic operations ... are checked for overflow.
QUESTION: So my question is, how can we detect overflow errors for Int64?
(I'm using Delphi 7. Does the same thing happen in newer versions of Delphi?)
This is a known issue. See http://qc.embarcadero.com/wc/qcmain.aspx?d=10185, and the comments Andy wrote at the bottom.
My suggestion would be to create a function (I did not compile nor test this - just an example):
function Foo(A, B : Int64) : Int64;
var bNeg : boolean;
begin
// Do we expect a negative result?
bNeg := ((a < 0) xor (b < 0));
// Get the real result
Result := a * b;
// If the result is wrong, raise an error
if ((Result < 0) xor bNeg) then begin
// Raise EOverFlow
end;
end;
This bug has been fixed in RAD Studio 10.2 Tokyo.
The issue can be found here (but one have to log in with embarcadero account to see it).
Here is correct version of __llmulo by John O'Harrow (licensed under MPL 1.1) shipped with Delphi versions 10.2 and above:
// Param 1(edx:eax), Param 2([esp+8]:[esp+4])
// Result is stored in edx:eax
// O-flag set on exit => result is invalid
// O-flag clear on exit => result is valid
procedure __llmulo();
asm
test edx, edx {Param1-Hi = 0?}
jne ##Large {No, More than one multiply may be needed}
cmp edx, [esp+8] {Param2-Hi = 0?}
jne ##Large {No, More than one multiply may be needed}
mul dword ptr [esp+4] {Only one multiply needed, Set Result}
and eax, eax {Clear Overflow Flag}
ret 8
##Large:
sub esp, 28 {allocate local storage}
mov [esp], ebx {save used registers}
mov [esp+4], esi
mov [esp+8], edi
mov [esp+12], ebp
mov ebx, [esp+32] {Param2-Lo}
mov ecx, [esp+36] {Param2-Hi}
mov esi, edx
mov edi, ecx
sar esi, 31
sar edi, 31
xor eax, esi
xor edx, esi
sub eax, esi
sbb edx, esi {edx:eax (a1:a0) = abs(Param1)}
xor ebx, edi
xor ecx, edi
sub ebx, edi
sbb ecx, edi {ecx:ebx (b1:b0) = abs(Param2)}
xor esi, edi {Sign Flag, 0 if Params have same sign else -1}
mov [esp+16], eax {a0}
mov [esp+20], edx {a1}
mov [esp+24], ecx {b1}
mul ebx {edx:eax (c1:c0) = a0*b0}
xchg ebx, edx {ebx = c1, edx = b0}
mov edi, eax {abs(Result-Lo) = c0}
xor ecx, ecx {Upper 32 bits of 128 bit result}
xor ebp, ebp {Second 32 bits of 128 bit result}
mov eax, [esp+20] {a1}
mul edx {edx:eax (d1:d0) = a1*b0}
add ebx, eax {c1 + d0}
adc ebp, edx {d1 + carry}
adc ecx, 0 {Possible carry into Upper 32 bits}
mov eax, [esp+16] {a0}
mov edx, [esp+24] {b1}
mul edx {edx:eax (e1:e0) = a0*b1}
add ebx, eax {abs(Result-Hi) = c1 + d0 + e0}
adc ebp, edx {d1 + e1 + carry}
adc ecx, 0 {Possible carry into Upper 32 bits}
mov eax, [esp+20] {a1}
mov edx, [esp+24] {b1}
mul edx {edx:eax (f1:f0) = a1*b1}
add ebp, eax {d1 + e1 + f0 + carry}
adc ecx, edx {f1 + carry}
or ecx, ebp {Overflow if ecx <> 0 or ebp <> 0}
jnz ##Overflow
mov edx, ebx {Set abs(Result-Hi)}
mov eax, edi {Set abs(Result-Lo)}
cmp edx, $80000000
jae ##CheckRange {Possible Overflow if edx>=$80000000}
##SetSign:
xor eax, esi {Correct Sign of Result}
xor edx, esi
sub eax, esi
sbb edx, esi
mov ebx, [esp] {restore used registers}
mov esi, [esp+4]
mov edi, [esp+8]
mov ebp, [esp+12]
add esp, 28 {Clears Overflow flag}
ret 8
##CheckRange:
jne ##Overflow {Overflow if edx>$80000000}
test esi, esi {edx=$80000000, Is Sign Flag=0?}
jnz ##SetSign {No, Result is Ok (-MaxInt64)}
##Overflow:
mov ebx, [esp] {restore used registers}
mov esi, [esp+4]
mov edi, [esp+8]
mov ebp, [esp+12]
add esp, 28
mov ecx, $80000000
dec ecx {Set Overflow Flag}
ret 8
end;

Delphi XE2 assembly

I have the following function that works in Delphi 2006, but under Delphi XE2 it gives either an access violation error or a privileged instruction error when processing RET.
function Q_TrimChar(const S: string; Ch: Char): string;
asm
PUSH ESI
MOV ESI,ECX
TEST EAX,EAX
JE ##qt
MOV ECX,[EAX-4]
TEST ECX,ECX
JE ##qt
PUSH EBX
PUSH EDI
MOV EBX,EAX
MOV EDI,EDX
XOR EDX,EDX
MOV EAX,ESI
CALL System.#LStrFromPCharLen
MOV EDX,EDI
MOV ECX,[EBX-4]
##lp1: CMP DL,BYTE PTR [EBX]
JNE ##ex1
INC EBX
DEC ECX
JNE ##lp1
MOV EDX,[ESI]
JMP ##wq
##ex1: DEC ECX
##lp2: CMP DL,BYTE PTR [EBX+ECX]
JNE ##ex2
DEC ECX
JMP ##lp2
##ex2: MOV EDI,[ESI]
LEA EDX,[EDI+ECX+1]
##lp3: MOV AL,BYTE PTR [EBX+ECX]
MOV BYTE PTR [EDI+ECX],AL
DEC ECX
JNS ##lp3
##wq: MOV EAX,[ESI]
MOV BYTE PTR [EDX],0
SUB EDX,EAX
MOV [EAX-4],EDX
POP EDI
POP EBX
POP ESI
RET
##qt: MOV EAX,ESI
CALL System.#LStrClr
POP ESI
end;
I don't know assembly very well. What is the problem?
I completely agree with David's suggestion to simply code this in Pascal and have upvoted that answer. Unless profiling has indicated that this is a true bottleneck then there's really no need for the ASM. Here are two versions. The first is easier to read but the second is more efficient:
function Q_TrimChar(const S: string; Ch: Char): string;
begin
result := S;
while (result <> '') and (result[1] = Ch) do Delete(Result, 1, 1);
while (result <> '') and (result[Length(Result)] = Ch) do Delete(Result, Length(Result), 1);
end;
function Q_TrimChar(const S: string; Ch: Char): string;
var
First, Last : integer;
begin
First := 1;
Last := Length(S);
while (First < Last) and (S[First] = Ch) do inc(First);
while (Last >= First) and (S[Last] = Ch) do Dec(Last);
Result := copy(S, First, Last-First+1);
end;
Delphi 2006 uses single byte ANSI characters and so string is AnsiString, Char is AnsiChar. On Delphi 2009 and later, two byte Unicode characters are used. This function cannot possibly work on both compilers.
Even the standard hack of using AnsiString and AnsiChar does not work. Most likely the assumptions that this function makes about the RTL implementation are no longer valid in modern Delphi.
I would re-write this function in Pascal and let the compiler do the work. Not only will that be the quickest way to solve your current problem, it will also get you over the hurdle of 64-bit compilation should you ever choose to tackle that.

Delphi label and asm weirdness?

I written an asm function in Delphi 7 but it transforms my code to something else:
function f(x: Cardinal): Cardinal; register;
label err;
asm
not eax
mov edx,eax
shr edx, 1
and eax, edx
bsf ecx, eax
jz err
mov eax, 1
shl eax, cl
mov edx, eax
add edx, edx
or eax, edx
ret
err:
xor eax, eax
end;
// compiled version
f:
push ebx // !!!
not eax
mov edx,eax
shr edx, 1
and eax, edx
bsf ecx, eax
jz +$0e
mov eax, 1
shl eax, cl
mov edx, eax
add edx, edx
or eax, edx
ret
err:
xor eax, eax
mov eax, ebx // !!!
pop ebx // !!!
ret
// the almost equivalent without asm
function f(x: Cardinal): Cardinal;
var
c: Cardinal;
begin
x := not x;
x := x and x shr 1;
if x <> 0 then
begin
c := bsf(x); // bitscanforward
x := 1 shl c;
Result := x or (x shl 1)
end
else
Result := 0;
end;
Why does it generate push ebx and pop ebx? And why does it do mov eax, ebx?
It seems that it generates the partial stack frame because of the mov eax, ebx.
This simple test generates mov eax, edx but doesn't generate that stack frame:
function asmtest(x: Cardinal): Cardinal; register;
label err;
asm
not eax
and eax, 1
jz err
ret
err:
xor eax, eax
end;
// compiled
asmtest:
not eax
and eax, $01
jz +$01
ret
xor eax, eax
mov eax, edx // !!!
ret
It seems that it has something to do with the label err. If I remove that I don't get the mov eax, * part.
Why does this happen?
Made a bug report on Quality Central.
The practical advice is: do not use label keyword in asm code, use ##-prefixed labels:
function f(x: Cardinal): Cardinal; register;
asm
not eax
mov edx,eax
shr edx, 1
and eax, edx
bsf ecx, eax
jz ##err
mov eax, 1
shl eax, cl
mov edx, eax
add edx, edx
or eax, edx
ret
##err:
xor eax, eax
end;
Updated:
I have not found the bug report in Basm area. It looks like a bug, but I have used BASM for many years and never thought about using label keyword such a way. In fact I never used label keyword in Delphi at all. :)
Well ... back then, in the Delphi-Manual, it used to say something about Compiler-Optimization and thealike-crazyness:
The Compiler generates Stackframes only for nested Routines, for Routines having local Variables and for Routines with Stack-Parameters
The auto-generated Initialization- and Finalizationcode for Routines includes:
PUSH EBP ; If Locals <> 0 or Params <> 0
MOV EBP,ESP ; If Locals <> 0 or Params <> 0
SUB ESP,Locals ; If Locals <> 0
...
MOV ESP,EBP ; If Locals <> 0
POP EBP ; If Locals <> 0 or Params <> 0
RET Params ; Always
If local Variables contain Variants, long Strings or Interfaces they are initialized with Null but aren't finalized afterwards.
Locals is the Size of local Variables, Params the Size of Parameters. If both Locals as well as Params are Null no Init-Code will be generated and the Finalizationcode only contains a RET-Intruction.
Maybe that has got something to do with it all...

Resources