Combining ASM with non-asm code (or SwapInt64 ASM function needed) - delphi

I need to process a file that is coming from the old Mac era (old Motorola CPU). The bytes are big endian so I have a function that swaps and Int64 to Intel little endian. The function is ASM and works on 32 bit CPU but not on 64. For 64 bit I have a different function that is not ASM. I want to combine the functions using IFDEF. Can I do this? Will it be a problem?
interface
function SwapInt64(Value: Int64): Int64; assembler;
implementation
{$IFDEF CPUx86}
function SwapInt64(Value: Int64): Int64; assembler; { Does not work on 64 bit } {
asm
MOV EDX,[DWORD PTR EBP + 12]
MOV EAX,[DWORD PTR EBP + 8]
BSWAP EAX
XCHG EAX,EDX
BSWAP EAX
end;
{$else}
function SwapInt64 (Value: Int64): Int64;
var P: PInteger;
begin
Result: = (Value shl 32) or (Value shr 32);
P: = #Result;
P ^: = (Swap (P ^) shl 16) or (Swap (P ^ shr 16));
Inc (P);
P ^: = (Swap (P ^) shl 16) or (Swap (P ^ shr 16));
end;
{$ENDIF}
I think the compiler will correctly compile/call the appropriate function no matter if one is ASM and the other is Pascal.

What you are proposing is perfectly fine. It is a quite reasonable approach.
If you want a 64 bit swap in asm, for x64, that's quite simple:
function SwapInt64(Value: Int64): Int64;
asm
MOV RAX,RCX
BSWAP RAX
end;
Combine this with the 32 bit version using conditional, as you have done in the question.
function SwapInt64(Value: Int64): Int64;
{$IF Defined(CPUX86)}
asm
MOV EDX,[DWORD PTR EBP + 12]
MOV EAX,[DWORD PTR EBP + 8]
BSWAP EAX
XCHG EAX,EDX
BSWAP EAX
end;
{$ELSEIF Defined(CPUX64)}
asm
MOV RAX,RCX
BSWAP RAX
end;
{$ELSE}
{$Message Fatal 'Unsupported architecture'}
{$ENDIF}
Or include a Pascal implementation in the {$ELSE} block.

The approach of swapping the bytes in a separate routine that cannot be inlined is a bit silly if performance is what you're after.
A better way to a assume you've got a block of data and all dword/qwords in it need to have their endianness changed.
This would look something like this.
For dwords
function SwapDWords(var Data; size: cardinal): boolean;
{ifdef CPUX64}
asm
//Data in RCX, Size in EDX
xor EAX,EAX //failure
test EDX,3
jz #MultipleOf4
#error:
ret
#MultipleOf4
neg EDX //Count up instead of down
jz #done
ADD RCX,RDX
#loop
mov R8d, [RCX+RDX]
bswap R8d
mov [RCX+RDX],R8d
add RDX,4 //add is faster than inc on modern processors
jnz #loop
#done:
inc EAX //success
ret
end;
For qwords
function SwapQWords(var Data; size: cardinal): boolean;
{ifdef CPUX64}
asm
//Data in RCX, Size in EDX
xor EAX,EAX //failure
test EDX,7
jz #MultipleOf8
#error:
ret
#MultipleOf8
neg EDX //Count up instead of down
jz #done
ADD RCX,RDX
#loop
mov R8, [RCX+RDX]
bswap R8
mov [RCX+RDX],R8
add RDX,8 //add is faster than inc on modern processors
jnz #loop
#done:
inc EAX //success
ret
end;
If you're already on 64 bit, then you have SSE2, and can use the 128-bit SSE registers.
Now you can process 4 dwords at a time, effectively unrolling the loop 4 times.
See: http://www.asmcommunity.net/forums/topic/?id=29743
movntpd xmm5,[RCX+RDX] //non-temporal move to avoid polluting the cache
movdqu xmm0, xmm5
movdqu xmm1, xmm5
pxor xmm5, xmm5
punpckhbw xmm0, xmm5 ; interleave '0' with bytes of original
punpcklbw xmm1, xmm5 ; so they become words
pshuflw xmm0, xmm0, 27 ; swap the words by shuffling
pshufhw xmm0, xmm0, 27 ;//27 = B00_01_10_11
pshuflw xmm1, xmm1, 27
pshufhw xmm1, xmm1, 27
packuswb xmm1, xmm0 ; make the words back into bytes.
movntpd [RCX+RDX], xmm1 //non-temporal move to keep the cache clean.

Simply use either LEToN() or BEtoN()
Use the LE variant if the data is little endian (e.g. 32 or 64-bits x86 mac, modern arm), use the BE if the source data (e.g. file from disk) is in big endian format.
Depending on the used architecture a swap or "nothing" will be inlined, usually a fairly optimal one for single conversions. For block oriented solutions see the posted SSE code (or Agner Fog's)

Related

Using FPC .o files in Delphi 2010

I've written quite a big library for matrix operations for Delphi and FPC.
There exists now an extension for this library for the Intel AVX extension but
I could only manage to get that compiled in FPC. My idea was to create
.o files in FPC which contains the AVX assembler codes and include these
files in Delphi. I tried to follow this question here:
Linking FPC .o files into Delphi
but without success. I was able to dump the function names and tried to import
these in the Delphi unit. The problem is that I always get an error saying that
the .o files is in the wrong format.
I use CodeTyphoon for compilation which internally uses FPC 3.1.1 and
Delphi2010 as a first try.
The code is once compiled in FPC and one time in Delphi using the approriate
ifdefs.
My base code looks like this (just an excerpt):
// ###################################################################
// #### This file is part of the mathematics library project, and is
// #### offered under the licence agreement described on
// #### http://www.mrsoft.org/
// ####
// #### Copyright:(c) 2011, Michael R. . All rights reserved.
// ####
// #### Unless required by applicable law or agreed to in writing, software
// #### distributed under the License is distributed on an "AS IS" BASIS,
// #### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// #### See the License for the specific language governing permissions and
// #### limitations under the License.
// ###################################################################
unit AVXMatrixMultOperations;
interface
{$IFDEF CPUX64}
{$DEFINE x64}
{$ENDIF}
{$IFDEF cpux86_64}
{$DEFINE x64}
{$ENDIF}
{$IFNDEF x64}
uses MatrixConst;
{$IFNDEF FPC}
// this fails -> wrong object format
{$L '.\AVXPrecompiled\win32\AVXMatrixMultOperations.o'}
{$ENDIF}
// full matrix operations
procedure AVXMatrixMultAligned(dest : PDouble; const destLineWidth : TASMNativeInt; mt1, mt2 : PDouble; width1, height1, width2, height2 : TASMNativeInt; const LineWidth1, LineWidth2 : TASMNativeInt);
{$IFNDEF FPC} external '' name 'AVXMATRIXMULTOPERATIONS_$$_AVXMATRIXMULTALIGNED$crc2A67AB04'; {$ENDIF}
{$ENDIF}
implementation
{$IFDEF FPC} {$ASMMODE intel} {$ENDIF}
{$IFNDEF x64}
{$IFDEF FPC}
procedure AVXMatrixMultAligned(dest : PDouble; const destLineWidth : TASMNativeInt; mt1, mt2 : PDouble; width1, height1, width2, height2 : TASMNativeInt; const LineWidth1, LineWidth2 : TASMNativeInt);
var bytesWidth2, destOffset : TASMNativeInt;
iter : TASMNativeInt;
{$IFDEF FPC}
begin
{$ENDIF}
asm
// prolog - simulate stack
push ebx;
push edi;
push esi;
mov ecx, dest;
mov edi, width1;
imul edi, -8;
mov iter, edi;
sub mt1, edi;
//destOffset := destLineWidth - Width2*sizeof(double);
mov ebx, Width2;
shl ebx, 3;
mov eax, destLineWidth;
sub eax, ebx;
mov destOffset, eax;
//bytesWidth2 := width2*sizeof(double);
mov bytesWidth2, ebx;
// for y := 0 to height1 - 1 do
##foryloop:
// r12 -> counter to width2
mov esi, width2;
sub esi, 2;
jl #LastXColumn;
##forxloop:
// for x := 0 to width2 div 2 - 1
// esi: mt1 - width1*sizeof(double)
// mt2: mt2
mov edx, mt1;
mov ebx, mt2;
mov eax, iter;
mov edi, LineWidth2;
vxorpd ymm0, ymm0, ymm0;
vxorpd ymm1, ymm1, ymm1;
cmp eax, -32;
jg ##Innerloop2Begin;
// for z := 0 to width1 - 1do
// AVX part:
##InnerLoop1:
// 4x4 block
vmovapd xmm2, [ebx];
add ebx, edi;
vmovapd xmm4, xmm2;
vmovapd xmm3, [ebx];
add ebx, edi;
// shuffle so we can multiply
// swap such that we can immediately multiply
vmovlhps xmm2, xmm2, xmm3;
vmovhlps xmm3, xmm3, xmm4;
// next 4 elements
vmovapd xmm4, [ebx];
add ebx, edi;
vmovapd xmm6, xmm4;
vmovapd xmm5, [ebx];
add ebx, edi;
vmovapd ymm7, [edx + eax]
vmovlhps xmm4, xmm4, xmm5;
vmovhlps xmm5, xmm5, xmm6;
vinsertf128 ymm2, ymm2, xmm4, 1;
vinsertf128 ymm3, ymm3, xmm5, 1;
// now multiply and add
vmulpd ymm2, ymm2, ymm7;
vmulpd ymm3, ymm3, ymm7;
vaddpd ymm0, ymm0, ymm2;
vaddpd ymm1, ymm1, ymm3;
add eax, 32;
jl ##InnerLoop1;
vextractf128 xmm2, ymm0, 1;
vextractf128 xmm3, ymm1, 1;
vhaddpd xmm0, xmm0, xmm2;
vhaddpd xmm1, xmm1, xmm3;
test eax, eax;
jz ##InnerLoopEnd2;
##Innerloop2Begin:
// rest in single elements
##InnerLoop2:
vmovapd xmm2, [ebx];
add ebx, edi;
vmovddup xmm3, [edx + eax];
vmulpd xmm2, xmm2, xmm3;
vmovhlps xmm4, xmm4, xmm2;
vaddsd xmm0, xmm0, xmm2;
vaddsd xmm1, xmm1, xmm4;
add eax, 8;
jnz ##InnerLoop2;
##InnerLoopEnd2:
// finall horizontal addition
vhaddpd xmm0, xmm0, xmm1;
vmovapd [ecx], xmm0;
// increment the pointers
// inc(mt2), inc(dest);
//add dword ptr [mt2], 8;
add mt2, 16;
add ecx, 16;
// end for x := 0 to width2 div 2 - 1
sub esi, 2;
jge ##forxloop;
#LastXColumn:
cmp esi, -1;
jne #NextLine;
// last column of mt2
mov eax, iter;
mov ebx, mt2;
vxorpd xmm0, xmm0, xmm0;
#InnerLoop2:
vmovsd xmm1, [edx + eax];
vmovsd xmm2, [ebx];
vmulsd xmm1, xmm1, xmm2;
vaddsd xmm0, xmm0, xmm1;
add ebx, edi;
add eax, 8;
jnz #InnerLoop2;
vmovsd [ecx], xmm0;
add ecx, 8;
add mt2, 8;
#NextLine:
// dec(mt2, Width2);
// inc(PByte(mt1), LineWidth1);
// inc(PByte(dest), destOffset);
//mov ebx, bytesWidth2;
//sub dword ptr [mt2], ebx;
mov eax, bytesWidth2;
sub mt2, eax;
mov eax, LineWidth1;
add mt1, eax;
add ecx, destOffset;
// end for y := 0 to height1 - 1
//dec eax;
dec height1;
jnz ##foryloop;
// epilog
vzeroupper;
pop esi;
pop edi;
pop ebx;
end;
{$IFDEF FPC}
end;
{$ENDIF}
{$ENDIF}
{$ENDIF}
end.
Since there is a single function involved here, the easiest is IMHO to convert the FPC AVXMatrixMultOperations.o file directly.
Use the great Object file converter tool.
You may try to convert from one binary format to another, accepted by Delphi.
But I guess that the cleanest way is to convert it to asm:
objconv -fasm AVXMatrixMultOperations.o
It will create a AVXMatrixMultOperations.asm file, which could be used to replace the unknown AVX instructions by simple db ..,..,..,.. bytes. Typically, the generated .asm file has the assembler on the left side, and the raw hexadecimal bytes on the right side.
This is how I dealt with old Delphi compilers in my libraries, for instance:
function crc32csse42(crc: cardinal; buf: PAnsiChar; len: cardinal): cardinal;
asm // eax=crc, edx=buf, ecx=len
not eax
test ecx, ecx
jz #0
test edx, edx
jz #0
#3: test edx, 3
jz #8 // align to 4 bytes boundary
{$ifdef ISDELPHI2010}
crc32 eax, byte ptr[edx]
{$else}
db $F2, $0F, $38, $F0, $02
{$endif}
inc edx
....
So in your case, something like
{$ifdef FPC}
vinsertf128 ymm2, ymm2, xmm4, 1;
vinsertf128 ymm3, ymm3, xmm5, 1;
{$else}
db $xx,$yy,$zz
db $xx,$yy,$zz
{$endif}

Code detects MMX/SSE/AVX but not AVX2

I want to have universal way of detecting specific CPU features. For this task I've created this function which takes EAX leaf number,Register name and bit number and returns true or false. It works fine for MMX/SSEx/AVX (EAX=1) but it does not detect AVX2 (EAX=7).
CPU: i5-4670k
OS: Windows 7
DetectCPUFeature('1','EDX',23) //DETECTS MMX CORRECTLY
DetectCPUFeature('1','EDX',25) //DETECTS SSE CORRECTLY
DetectCPUFeature('1','EDX',26) //DETECTS SSE2 CORRECTLY
DetectCPUFeature('1','ECX',0) //DETECTS SSE3 CORRECTLY
DetectCPUFeature('1','ECX',9) //DETECTS SSSE3 CORRECTLY
DetectCPUFeature('1','ECX',19) //DETECTS SSE4.1 CORRECTLY
DetectCPUFeature('1','ECX',20) //DETECTS SSE4.2 CORRECTLY
DetectCPUFeature('1','ECX',28) //DETECTS AVX CORRECTLY
DetectCPUFeature('7','EBX',5) //DOES NOT DETECT AVX2!
.
function DetectCPUFeature(EAX_Leaf_HEX,Register_Name:string;Bit:byte):boolean;
var _eax,_ebx,_ecx,_edx,EAX_Leaf,_Result: Longword;
x:integer;
Binary_mask:string;
Decimal_mask:int64;
begin
EAX_Leaf:=HexToInt(EAX_Leaf_HEX);
Binary_mask:='1';
for x:=1 to Bit do Binary_mask:=Binary_mask+'0';
Decimal_mask:=BinToInt(Binary_mask);
if AnsiUpperCase(Register_Name)='EDX' then
begin
asm
mov eax,EAX_Leaf // https://en.wikipedia.org/wiki/CPUID
db $0F,$A2 // db $0F,$A2 = CPUID instruction
mov _Result,edx
end;
end;
if AnsiUpperCase(Register_Name)='ECX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ecx
end;
end;
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
end;
end;
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
end;
This sort of code is very dubious, mixing asm with Pascal code. Your code, in the asm blocks modifies registers and fails to restore them. That could easily be conflicting with the compiler's register usage. My strong advice to you is that you should never mix asm and Pascal in this way. Always use pure Pascal or pure asm.
What you need is a function that will perform the CPUID instruction and return you all the registers in a structure. You can then pick out what you want from that using Pascal code.
In addition, as #J... points out, you need to specify the sub-leaf value in the ECX register before invoking the CPUID instruction. That is a requirement for a number of the more recently added CPUID arguments.
This is the function you need:
type
TCPUID = record
EAX: Cardinal;
EBX: Cardinal;
ECX: Cardinal;
EDX: Cardinal;
end;
function GetCPUID(Leaf, Subleaf: Cardinal): TCPUID;
asm
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
end;
I've written this for 32 bit code, but if you need to support 64 bit code also that support is easy enough to add.
function GetCPUID(Leaf, Subleaf: Integer): TCPUID;
asm
{$IF Defined(CPUX86)}
push ebx
push edi
mov edi, ecx
mov ecx, edx
cpuid
mov [edi+$0], eax
mov [edi+$4], ebx
mov [edi+$8], ecx
mov [edi+$c], edx
pop edi
pop ebx
{$ELSEIF Defined(CPUX64)}
mov r9,rcx
mov ecx,r8d
mov r8,rbx
mov eax,edx
cpuid
mov [r9+$0], eax
mov [r9+$4], ebx
mov [r9+$8], ecx
mov [r9+$c], edx
mov rbx, r8
{$ELSE}
{$Message Fatal 'GetCPUID has not been implemented for this architecture.'}
{$IFEND}
end;
With this at hand you can call CPUID passing any value as input, and retrieve all 4 registers of output, with which you can then do whatever you please.
Your code to create a bitmask is extremely inefficient and very far from idiomatic. Use 1 shl N to create a value with a single bit set, in position N.
Code like this:
if (_Result and Decimal_mask) = Decimal_mask then DetectCPUFeature:=true
else DetectCPUFeature:=false;
is also some way from idiomatic. That would normally be written like this:
DetectCPUFeature := value and mask <> 0;
You might end up with a wrapper function that looks like this:
type
TCPUIDRegister = (regEAX, regEBX, regECX, regEDX);
function GetCPUIDRegister(CPUID: TCPUID; Reg: TCPUIDRegister): Cardinal;
begin
case Reg of
regEAX:
Result := CPUID.EAX;
regEBX:
Result := CPUID.EBX;
regECX:
Result := CPUID.ECX;
regEDX:
Result := CPUID.EDX;
end;
end;
function CPUFeatureEnabled(Leaf, Subleaf: Cardinal; Reg: TCPUIDRegister; Bit: Integer): Boolean;
var
value: Cardinal;
begin
value := GetCPUIDRegister(GetCPUID(Leaf, Subleaf), Reg);
Result := value and (1 shl Bit) <> 0;
end;
While David's answer is excellent, the reason the function fails is that the ECX register is not set to zero (required for fetching extended info in the CPUID call).
See : How to detect New Instruction support in the 4th generation Intel® Core™ processor family
where AVX2 is found by (emphasis mine)
CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
The following correctly returns the extended information and identifies AVX2 support.
if AnsiUpperCase(Register_Name)='EBX' then
begin
asm
push ecx { push ecx to stack}
mov ecx, 0 { set ecx to zero}
mov eax,EAX_Leaf
db $0F,$A2
mov _Result,ebx
pop ecx { restore ecx}
end;
The other asm functions have the same error as ECX is required to be zero for those calls also.
taken from Synopse Informatique:
type
/// the potential features, retrieved from an Intel CPU
// - see https://en.wikipedia.org/wiki/CPUID#EAX.3D1:_Processor_Info_and_Feature_Bits
TALIntelCpuFeature =
( { in EDX }
cfFPU, cfVME, cfDE, cfPSE, cfTSC, cfMSR, cfPAE, cfMCE,
cfCX8, cfAPIC, cf_d10, cfSEP, cfMTRR, cfPGE, cfMCA, cfCMOV,
cfPAT, cfPSE36, cfPSN, cfCLFSH, cf_d20, cfDS, cfACPI, cfMMX,
cfFXSR, cfSSE, cfSSE2, cfSS, cfHTT, cfTM, cfIA64, cfPBE,
{ in ECX }
cfSSE3, cfCLMUL, cfDS64, cfMON, cfDSCPL, cfVMX, cfSMX, cfEST,
cfTM2, cfSSSE3, cfCID, cfSDBG, cfFMA, cfCX16, cfXTPR, cfPDCM,
cf_c16, cfPCID, cfDCA, cfSSE41, cfSSE42, cfX2A, cfMOVBE, cfPOPCNT,
cfTSC2, cfAESNI, cfXS, cfOSXS, cfAVX, cfF16C, cfRAND, cfHYP,
{ extended features in EBX, ECX }
cfFSGS, cf_b01, cfSGX, cfBMI1, cfHLE, cfAVX2, cf_b06, cfSMEP, cfBMI2,
cfERMS, cfINVPCID, cfRTM, cfPQM, cf_b13, cfMPX, cfPQE, cfAVX512F,
cfAVX512DQ, cfRDSEED, cfADX, cfSMAP, cfAVX512IFMA, cfPCOMMIT,
cfCLFLUSH, cfCLWB, cfIPT, cfAVX512PF, cfAVX512ER, cfAVX512CD,
cfSHA, cfAVX512BW, cfAVX512VL, cfPREFW1, cfAVX512VBMI);
/// all features, as retrieved from an Intel CPU
TALIntelCpuFeatures = set of TALIntelCpuFeature;
var
/// the available CPU features, as recognized at program startup
ALCpuFeatures: TALIntelCpuFeatures;
{**}
type
_TRegisters = record
eax,ebx,ecx,edx: cardinal;
end;
{***************************************************************}
procedure _GetCPUID(Param: Cardinal; var Registers: _TRegisters);
{$IF defined(CPU64BITS)}
asm // ecx=param, rdx=Registers (Linux: edi,rsi)
.NOFRAME
mov eax, ecx
mov r9, rdx
mov r10, rbx // preserve rbx
xor ebx, ebx
xor ecx, ecx
xor edx, edx
cpuid
mov _TRegisters(r9).&eax, eax
mov _TRegisters(r9).&ebx, ebx
mov _TRegisters(r9).&ecx, ecx
mov _TRegisters(r9).&edx, edx
mov rbx, r10
end;
{$else}
asm
push esi
push edi
mov esi, edx
mov edi, eax
pushfd
pop eax
mov edx, eax
xor eax, $200000
push eax
popfd
pushfd
pop eax
xor eax, edx
jz #nocpuid
push ebx
mov eax, edi
xor ecx, ecx
cpuid
mov _TRegisters(esi).&eax, eax
mov _TRegisters(esi).&ebx, ebx
mov _TRegisters(esi).&ecx, ecx
mov _TRegisters(esi).&edx, edx
pop ebx
#nocpuid:
pop edi
pop esi
end;
{$ifend}
{******************************}
procedure _TestIntelCpuFeatures;
var regs: _TRegisters;
begin
regs.edx := 0;
regs.ecx := 0;
_GetCPUID(1,regs);
PIntegerArray(#ALCpuFeatures)^[0] := regs.edx;
PIntegerArray(#ALCpuFeatures)^[1] := regs.ecx;
_GetCPUID(7,regs);
PIntegerArray(#ALCpuFeatures)^[2] := regs.ebx;
PByteArray(#ALCpuFeatures)^[12] := regs.ecx;
end;
initialization
_TestIntelCpuFeatures;

E2107 Operand size mismatch

I use this functions since D2007 I got it online, don't remember where.
But now in XE7 it return a compilation error:
"E2107 Operand size mismatch"
function FastCharPos(const aSource : string; const C: Char; StartPos : Integer) : Integer;
var
L : Integer;
begin
//If this assert failed, it is because you passed 0 for StartPos, lowest value is 1 !!
Assert(StartPos > 0);
Result := 0;
L := Length(aSource);
if L = 0 then exit;
if StartPos > L then exit;
Dec(StartPos);
asm
PUSH EDI //Preserve this register
mov EDI, aSource //Point EDI at aSource
add EDI, StartPos
mov ECX, L //Make a note of how many chars to search through
sub ECX, StartPos
mov AL, C //and which char we want :Error -"E2107 Operand size mismatch"
#Loop:
cmp Al, [EDI] //compare it against the SourceString
jz #Found
inc EDI
dec ECX
jnz #Loop
jmp #NotFound
#Found:
sub EDI, aSource //EDI has been incremented, so EDI-OrigAdress = Char pos !
inc EDI
mov Result, EDI
#NotFound:
POP EDI
end;
end;
function FastCharPosNoCase(const aSource : string; C: Char; StartPos : Integer) : Integer;
var
L : Integer;
begin
Result := 0;
L := Length(aSource);
if L = 0 then exit;
if StartPos > L then exit;
Dec(StartPos);
if StartPos < 0 then StartPos := 0;
asm
PUSH EDI //Preserve this register
PUSH EBX
mov EDX, GUpcaseLUT
mov EDI, aSource //Point EDI at aSource
add EDI, StartPos
mov ECX, L //Make a note of how many chars to search through
sub ECX, StartPos
xor EBX, EBX
mov BL, C //:Error -"E2107 Operand size mismatch"
mov AL, [EDX+EBX]
#Loop:
mov BL, [EDI]
inc EDI
cmp Al, [EDX+EBX]
jz #Found
dec ECX
jnz #Loop
jmp #NotFound
#Found:
sub EDI, aSource //EDI has been incremented, so EDI-OrigAdress = Char pos !
mov Result, EDI
#NotFound:
POP EBX
POP EDI
end;
end;
What do I need to update these two functions to XE7 win32?
What must I do?
Thanks.
This code was written for pre Unicode Delphi where Char is an alias for AnsiChar, the 8 bit character type. In Delphi 2009 and later, Char is an alias for WideChar the 16 bit character type.
The reason for the error message is that the code is intended to operate on 8 bit character elements, but you are providing 16 bit operands. The operator expects 8 bit operands, but you supplied 16 bit operands.
Change Char to AnsiChar to make this code compile and behave as intended on all versions of Delphi.
Having said that, I suggest you stop using this code. Instead use Pos. As a rule, it is preferable to use built-in library functions.
You should stop using old assembler version for string routines and use the use built-in library functions.
If you want to move on in a hurry you can reimplement you functions like this:
function FastCharPos(const aSource: string; const C: Char; StartPos: Integer): Integer; inline;
begin
Result := Pos(C, aSource, StartPos);
end;
function FastCharPosNoCase(const aSource: string; C: Char; StartPos: Integer): Integer; inline;
begin
Result := Pos(AnsiUppercase(C), AnsiUppercase(aSource), StartPos);
end;

What does DUnit2's CallerAddr function do, and how do I convert it to 64 bits?

I am trying to get DUnit2 working under 64 bits, but I am stumped to what this method does, let alone how to convert it to 64 bits. Pure Pascal would better, but since it refers to the stack (ebp), it might not be possible.
function CallerAddr: Pointer; assembler;
const
CallerIP = $4;
asm
mov eax, ebp
call IsBadPointer
test eax,eax
jne ##Error
mov eax, [ebp].CallerIP
sub eax, 5 // 5 bytes for call
push eax
call IsBadPointer
test eax,eax
pop eax
je ##Finish
##Error:
xor eax, eax
##Finish:
end;
function RtlCaptureStackBackTrace(FramesToSkip: ULONG; FramesToCapture: ULONG;
out BackTrace: Pointer; BackTraceHash: PULONG): USHORT; stdcall;
external 'kernel32.dll' name 'RtlCaptureStackBackTrace' delayed;
function CallerAddr: Pointer;
begin
// Skip 2 Frames, one for the return of CallerAddr and one for the
// return of RtlCaptureStackBackTrace
if RtlCaptureStackBackTrace(2, 1, Result, nil) > 0 then
begin
if not IsBadPointer(Result) then
Result := Pointer(NativeInt(Result) - 5)
else
Result := nil;
end
else
begin
Result := nil;
end;
end;
function CallerAddr: Pointer; assembler;
const
CallerIP = $4;
asm
mov rax, rcx ;For int.. XMM0 for float
call IsBadPointer
test rax,rax
jne ##Error
mov rax, [rcx].CallerIP
sub rax, 5 // 5 bytes for call
push rax
call IsBadPointer
test rax,rax
pop rax
je ##Finish
##Error:
xor rax, rax
##Finish:
end;

Delphi XE2 assembly

I have the following function that works in Delphi 2006, but under Delphi XE2 it gives either an access violation error or a privileged instruction error when processing RET.
function Q_TrimChar(const S: string; Ch: Char): string;
asm
PUSH ESI
MOV ESI,ECX
TEST EAX,EAX
JE ##qt
MOV ECX,[EAX-4]
TEST ECX,ECX
JE ##qt
PUSH EBX
PUSH EDI
MOV EBX,EAX
MOV EDI,EDX
XOR EDX,EDX
MOV EAX,ESI
CALL System.#LStrFromPCharLen
MOV EDX,EDI
MOV ECX,[EBX-4]
##lp1: CMP DL,BYTE PTR [EBX]
JNE ##ex1
INC EBX
DEC ECX
JNE ##lp1
MOV EDX,[ESI]
JMP ##wq
##ex1: DEC ECX
##lp2: CMP DL,BYTE PTR [EBX+ECX]
JNE ##ex2
DEC ECX
JMP ##lp2
##ex2: MOV EDI,[ESI]
LEA EDX,[EDI+ECX+1]
##lp3: MOV AL,BYTE PTR [EBX+ECX]
MOV BYTE PTR [EDI+ECX],AL
DEC ECX
JNS ##lp3
##wq: MOV EAX,[ESI]
MOV BYTE PTR [EDX],0
SUB EDX,EAX
MOV [EAX-4],EDX
POP EDI
POP EBX
POP ESI
RET
##qt: MOV EAX,ESI
CALL System.#LStrClr
POP ESI
end;
I don't know assembly very well. What is the problem?
I completely agree with David's suggestion to simply code this in Pascal and have upvoted that answer. Unless profiling has indicated that this is a true bottleneck then there's really no need for the ASM. Here are two versions. The first is easier to read but the second is more efficient:
function Q_TrimChar(const S: string; Ch: Char): string;
begin
result := S;
while (result <> '') and (result[1] = Ch) do Delete(Result, 1, 1);
while (result <> '') and (result[Length(Result)] = Ch) do Delete(Result, Length(Result), 1);
end;
function Q_TrimChar(const S: string; Ch: Char): string;
var
First, Last : integer;
begin
First := 1;
Last := Length(S);
while (First < Last) and (S[First] = Ch) do inc(First);
while (Last >= First) and (S[Last] = Ch) do Dec(Last);
Result := copy(S, First, Last-First+1);
end;
Delphi 2006 uses single byte ANSI characters and so string is AnsiString, Char is AnsiChar. On Delphi 2009 and later, two byte Unicode characters are used. This function cannot possibly work on both compilers.
Even the standard hack of using AnsiString and AnsiChar does not work. Most likely the assumptions that this function makes about the RTL implementation are no longer valid in modern Delphi.
I would re-write this function in Pascal and let the compiler do the work. Not only will that be the quickest way to solve your current problem, it will also get you over the hurdle of 64-bit compilation should you ever choose to tackle that.

Resources