I have a Delphi Firemonkey EXIF implementation I'm using in a routine to load image files. I'm trying to determine whether or not the image has been rotated, so I can correct the orientation of the image before displaying it. This routine, in part calls assembly code that executes a BSWAP to determine where header information in the image file is located. Here is a part of the code:
type
TMarker = packed record
Marker : Word; //Section marker
Len : Word; //Length Section
Indefin : Array [0..4] of Char; //Indefiner - "Exif" 00, "JFIF" 00 and ets
Pad : Char; //0x00
end;
TIFDHeader = packed record
pad : Byte; //00h
ByteOrder : Word; //II (4D4D) or MM
i42 : Word; //2A00 (magic number from the 'Hitchhikers Guide'
Offset : Cardinal; //0th offset IFD
Count : Word; // number of IFD entries
end;
function SwapLong(Value: Cardinal): Cardinal;
asm bswap eax end;
procedure TExif.ReadFromFile(const FileName: string);
var
j: TMarker;
ifd: TIFDHeader;
off0: Cardinal; //Null Exif Offset
SOI: Word; //2 bytes SOI marker. FF D8 (Start Of Image)
f: File;
begin
if not FileExists(FileName) then exit;
Init;
System.FileMode:=0; //Read Only open
AssignFile(f,FileName);
reset(f,1);
BlockRead(f,SOI,2);
if SOI=$D8FF then begin //Is this Jpeg
BlockRead(f,j,9);
if j.Marker=$E0FF then begin //JFIF Marker Found
Seek(f,20); //Skip JFIF Header
BlockRead(f,j,9);
end;
//Search Exif start marker;
if j.Marker<>$E1FF then begin
i:=0;
repeat
BlockRead(f,SOI,2); //Read bytes.
inc(i);
until (EOF(f) or (i>1000) or (SOI=$E1FF));
//If we find maker
if SOI=$E1FF then begin
Seek(f,FilePos(f)-2); //return Back on 2 bytes
BlockRead(f,j,9); //read Exif header
end;
end;
if j.Marker=$E1FF then begin //If we found Exif Section. j.Indefin='Exif'.
FValid:=True;
off0:=FilePos(f)+1; //0'th offset Exif header
BlockRead(f,ifd,11); //Read IDF Header
FSwap := ifd.ByteOrder=$4D4D; // II or MM - if MM we have to swap
if FSwap then begin
ifd.Offset := SwapLong(ifd.Offset);
ifd.Count := Swap(ifd.Count);
end;
if ifd.Offset <> 8 then begin
Seek(f, FilePos(f)+abs(ifd.Offset)-8);
end;
This works fine when the application is built for 32-bit Windows, but fails at the SwapLong call under 64-bit Windows. I don't know the first thing about Assembly language and so I'm looking for how to handle the same functionality when building the 64-bit version of the program. Just as a note, in both versions the idf.OffSet value passed to the SwapLong function is 134217728 ($08000000). In the 32-bit version the SwapLong returns a value of 8, but the 64-bit version returns a value of 2694969615 given what appears to be the same input.
I need the 64-bit version to work as I am looking to target 64-bit MAC OSX with the same code. Any help would be greatly appreciated.
The issue exists because the inline assembly assumes the first argument as well as the return value to be using register eax, which is true for Delphi in 32-bit mode as per Delphi's calling convention (and although the inline assembly documentation states that there shouldn't be made any assumptions about registers other than ebp and esp, this always held true even inside of inline assembly statements when they were placed at the top of a function).
However, 64-bit mode uses a different calling convention in which the first argument is in rcx and the return value is using rax. So here you are getting random uninitialized garbage as return value that happened to be in that register (with its bytes swapped) because it's never explicitly set.
The best, portable solution would be to implement the byte swap in pure Pascal without inline assembly:
function SwapLong(Value: Cardinal): Cardinal;
begin
Result := Swap(Value shr 16) or (Cardinal(Swap(Value)) shl 16);
end;
This uses the decades-old Swap function which swaps the lower 2 bytes of a value. This isn't of much use on its own anymore but it can be utilized twice (together with some bit shifting and masking) to shorten code for swapping all 4 bytes of a 32-bit value.
Another way which has more source code but can produce less convoluted assembly code as a result would be accessing the individual bytes in the Cardinal using byte pointers:
function SwapLong(Value: Cardinal): Cardinal; inline;
begin
PByte(#Result)^ := PByte(NativeUInt(#Value) + 3)^;
PByte(NativeUInt(#Result) + 1)^ := PByte(NativeUInt(#Value) + 2)^;
PByte(NativeUInt(#Result) + 2)^ := PByte(NativeUInt(#Value) + 1)^;
PByte(NativeUInt(#Result) + 3)^ := PByte(#Value)^;
end;
64-bit assembly passes parameters in different registers than 32-bit. In this case, parameter will be in ECX register, and return value needs to be in EAX.
That requires different code for 32-bit and 64-bit assembly.
function SwapLong(Value: Cardinal): Cardinal;
{$IFDEF ASSEMBLER}
{$IFDEF CPUX86}
asm
bswap eax
end;
{$ENDIF CPUX86}
{$IFDEF CPUX64}
asm
mov eax, ecx
bswap eax
end;
{$ENDIF CPUX64}
{$ELSE}
begin
// pascal version
end;
{$ENDIF}
Since inline assembly is only available on Windows, other platforms need pure pascal code as shown in CherryDT's answer
Related
During a short look at the source code of system.math, I discovered that
the 64-bit version Delphi Tokyo 10.2.3 flushes denormal IEEE-Doubles to zero, as can be seen from then following program;
{$apptype console}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(#twopm1030)^;
writeln(x);
x := ldexp(1,-515);
writeln(x*x);
x := ldexp(1,-1030);
writeln(x);
end.
For 32-bit the output is as expected
8.69169475979376E-0311
8.69169475979376E-0311
8.69169475979376E-0311
but with 64-bit I get
8.69169475979375E-0311
0.00000000000000E+0000
0.00000000000000E+0000
So basically Tokyo can handle denormal numbers in 64-bit mode, the constant is written correctly, but from arithmetic operations or even with ldexp a denormal result is flushed to zero.
Can this observation be confirmed on other systems? If yes, where it is documented? (The only info I could find about zero-flushing is,
that Denormals become zero when stored in a Real48).
Update: I know that for both 32- and 64-bit the single overload is used. For 32-bit the x87 FPU is used and the ASM code is virtually identical for all precisions (single, double, extended). The FPU always returns a 80-bit extended which is stored in a double without premature truncation. The 64-bit code does precision adjustment before storing.
Meanwhile I filed an issue report (https://quality.embarcadero.com/browse/RSP-20925), with the focus on the inconsistent results for 32- or 64-bit.
Update:
There is only a difference in how the compiler treats the overloaded selection.
As #Graymatter found out, the LdExp overload called is the Single type for both the 32-bit and the 64-bit compiler. The only difference is the codebase, where the 32-bit compiler is using asm code, while the 64-bit compiler has a purepascal implementation.
To fix the code to use the correct overload, explicitly define the type for the LdExp() first argument like this it works (64-bit):
program Project116;
{$APPTYPE CONSOLE}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(#twopm1030)^;
writeln(x);
x := ldexp(Double(1),-515);
writeln(x*x);
x := ldexp(Double(1),-1030);
writeln(x);
ReadLn;
end.
Outputs:
8.69169475979375E-0311
8.69169475979375E-0311
8.69169475979375E-0311
I would say that this behaviour should be reported as a RTL bug, since the overloaded function selected in your case is the Single type. The resulting type is a Double and the compiler should definitely adapt accordingly.
since the 32-bit and the 64-bit compiler should produce the same result.
Note, the Double(1) typecast for floating point types, was introduced in Delphi 10.2 Tokyo. For solutions in prevoius versions, see What is first version of Delphi which allows typecasts like double(10).
The problem here is that Ldexp(single) is returning different results depending on whether the ASM code is being called or whether the pascal code is called. In both cases, the compiler is calling the Single version of the overload because the type isn't specified in the call.
Your pascal code which is executed in the Win64 scenario tries to deal with the exponent less than -126 but the method is still not able to correctly calculate the result because single numbers are limited to an 8 bit exponent. The assembler seems to get around this but I didn't look into it in much detail as to why that's the case.
function Ldexp(const X: Single; const P: Integer): Single;
{ Result := X * (2^P) }
{$IFNDEF X86ASM}
var
T: Single;
I: Integer;
const
MaxExp = 127;
MinExp = -126;
FractionOfOne = $00800000;
begin
T := X;
Result := X;
case T.SpecialType of
fsDenormal,
fsNDenormal,
fsPositive,
fsNegative:
begin
FClearExcept;
I := P;
if I > MaxExp then
begin
T.BuildUp(False, FractionOfOne, MaxExp);
Result := Result * T;
I := I - MaxExp;
if I > MaxExp then I := MaxExp;
end
else if I < MinExp then
begin
T.BuildUp(False, FractionOfOne, MinExp);
Result := Result * T;
I := I - MinExp;
if I < MinExp then I := MinExp;
end;
if I <> 0 then
begin
T.BuildUp(False, FractionOfOne, I);
Result := Result * T;
end;
FCheckExcept;
end;
// fsZero,
// fsNZero,
// fsInf,
// fsNInf,
// fsNaN:
else
;
end;
end;
{$ELSE X86ASM}
{$IF defined(CPUX86) and defined(IOS)} // iOS/Simulator
...
{$ELSE}
asm // StackAlignSafe
PUSH EAX
FILD dword ptr [ESP]
FLD X
FSCALE
POP EAX
FSTP ST(1)
FWAIT
end;
{$ENDIF}
{$ENDIF X86ASM}
As LU RD suggested, you can get around the problem by forcing the methods to call the Double overload. There is a bug but that bug is that the ASM code doesn't match the pascal code in Ldexp(const X: Single; const P: Integer), not that a different overload is being called.
I use FNV to hash a file in my 32 bit OS. Why the hash result of same code is different if I use the code in 64 bit OS?
Here's my code:
function gethash(const dwOffset: PByteArray; const dwLen: DWORD;
const offset_basis: DWORD): DWORD;
var
i: integer;
begin
Result := offset_basis;
try
{$R-}{$I-}
for i := 0 to dwLen - 1 do
Result := (Result * 16777619) xor DWORD(dwOffset^[i]);
{$R+}{$I+}
except
;
end;
end;
This code will produce the same output irrespective of the bitness of the operating system. Furthermore the output is independent of the bitness of the process. That is, if you compile for 32 bit and 64 bit, the output will be the same.
The logical conclusion therefore, is that the different output is caused by supplying different input.
One might ask why you have a swallow all exception handler in your code. That must be a really bad idea. If you supply incorrect parameters which lead to an access violation, you'll never find out. I urge you to remove that exception handler.
To demonstrate my point, I offer the following simple test program:
{$APPTYPE CONSOLE}
uses
Winapi.Windows,
System.SysUtils;
{$R-}{$I-}
function gethash(const dwOffset: PByteArray; const dwLen: DWORD;
const offset_basis: DWORD): DWORD;
var
i: integer;
begin
Result := offset_basis;
for i := 0 to dwLen - 1 do
Result := (Result * 16777619) xor DWORD(dwOffset^[i]);
end;
procedure Main;
var
i: Integer;
buf: TBytes;
begin
SetLength(buf, 666);
for i := 0 to high(buf) do
buf[i] := (i+13)*7 mod 256;
Writeln(gethash(PByteArray(buf), Length(buf), 17));
end;
begin
Main;
Readln;
end.
This produces the same output on every operating system, and the same output when compiled by either the 32 bit or 64 bit compiler. Whilst this does not prove that will be the case for every possible input, it gives you a test bed. You can replace my input with your own and you will discover, if I am right, that the output will always be the same, for fixed input.
One plausible explanation for your problem is that you are reading beyond the end of the buffer and so hashing ill-defined input data. And perhaps with a different OS or compiler, that ill-defined input data differs.
Following on from this question (Dynamic arrays and memory management in Delphi), if I create a dynamic array in Delphi, how do I access the reference count?
SetLength(a1, 100);
a2 := a1;
// The reference count for the array pointed to by both
// a1 and a2 should be 2. How do I retrieve this?
Additionally, if the reference count can be accessed, can it also be modified manually? This latter question is mainly theoretical rather than for use practically (unlike the first question above).
You can see how the reference count is managed by inspecting the code in the System unit. Here are the pertinent parts from the XE3 source:
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
....
procedure _DynArrayAddRef(P: Pointer);
begin
if P <> nil then
AtomicIncrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
function _DynArrayRelease(P: Pointer): LongInt;
begin
Result := AtomicDecrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
A dynamic array variable holds a pointer. If the array is empty, then the pointer is nil. Otherwise the pointer contains the address of the first element of the array. Immediately before the first element of the array is stored the metadata for the array. The TDynArrayRec type describes that metadata.
So, if you wish to read the reference count you can use the exact same technique as does the RTL. For instance:
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
If you want to modify the reference count then you can do so by exposing the functions in System:
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
Note that the name DynArrayRelease as chosen by the RTL designers is a little mis-leading because it merely reduces the reference count. It does not release memory when the count reaches zero.
I'm not sure why you would want to do this mind you. Bear in mind that once you start modifying the reference count, you have to take full responsibility for getting it right. For instance, this program leaks:
{$APPTYPE CONSOLE}
var
a, b: array of Integer;
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
begin
ReportMemoryLeaksOnShutdown := True;
SetLength(a, 1);
Writeln(DynArrayRefCount(a));
b := a;
Writeln(DynArrayRefCount(a));
DynArrayAddRef(a);
Writeln(DynArrayRefCount(a));
a := nil;
Writeln(DynArrayRefCount(b));
b := nil;
Writeln(DynArrayRefCount(b));
end.
And if you make a call to DynArrayRelease that takes the reference count to zero then you would also need to dispose of the array, for reasons discussed above. I've never encountered a problem that would require manipulation of the reference count, and strongly suggest that you avoid doing so.
One final point. The RTL does not offer this functionality through its public interface. Which means that all of the above is private implementation detail. And so is subject to change in a future release. If you do attempt to read or modify the reference count then you must recognise that doing so relies on such implementation detail.
After some googling, I found an excellent article by Rudy Velthuis. I highly recommend to read it. Quoting dynamic arrays part from http://rvelthuis.de/articles/articles-pointers.html#dynarrays
At the memory location below the address to which the pointer points, there are two more fields, the number of elements allocated, and the reference count.
If, as in the diagram above, N is the address in the dynamic array variable, then the reference count is at address N-8, and the number of allocated elements (the length indicator) at N-4. The first element is at address N.
How to access these:
SetLength(a1, 100);
a2 := a1;
// Reference Count = 2
refCount := PInteger(NativeUInt(#a1[0]) - SizeOf(NativeInt) - SizeOf(Integer))^;
// Array Length = 100
arrLength := PNativeInt(NativeUInt(#a1[0]) - SizeOf(NativeInt))^;
The trick in computing proper offsets is to account for differences between 32bit and 64bit platforms code. Fields size in bytes is as follows:
32bit 64bit
RefCount 4 4
Length 4 8
I'm doing some heavy work on large integer numbers in UInt64 values, and was wondering if Delphi has an integer square root function.
Fow now I'm using Trunc(Sqrt(x*1.0)) but I guess there must be a more performant way, perhaps with a snippet of inline assembler? (Sqrt(x)with x:UInt64 throws an invalid type compiler error in D7, hence the *1.0 bit.)
I am very far from an expert on assembly, so this answer is just me fooling around.
However, this seems to work:
function isqrt(const X: Extended): integer;
asm
fld X
fsqrt
fistp #Result
fwait
end;
as long as you set the FPU control word's rounding setting to "truncate" prior to calling isqrt. The easiest way might be to define the helper function
function SetupRoundModeForSqrti: word;
begin
result := Get8087CW;
Set8087CW(result or $600);
end;
and then you can do
procedure TForm1.FormCreate(Sender: TObject);
var
oldCW: word;
begin
oldCW := SetupRoundModeForSqrti; // setup CW
// Compute a few million integer square roots using isqrt here
Set8087CW(oldCW); // restore CW
end;
Test
Does this really improve performance? Well, I tested
procedure TForm1.FormCreate(Sender: TObject);
var
oldCW: word;
p1, p2: Int64;
i: Integer;
s1, s2: string;
const
N = 10000000;
begin
oldCW := SetupRoundModeForSqrti;
QueryPerformanceCounter(p1);
for i := 0 to N do
Tag := isqrt(i);
QueryPerformanceCounter(p2);
s1 := inttostr(p2-p1);
QueryPerformanceCounter(p1);
for i := 0 to N do
Tag := trunc(Sqrt(i));
QueryPerformanceCounter(p2);
s2 := inttostr(p2-p1);
Set8087CW(oldCW);
ShowMessage(s1 + #13#10 + s2);
end;
and got the result
371802
371774.
Hence, it is simply not worth it. The naive approach trunc(sqrt(x)) is far easier to read and maintain, has superior future and backward compatibility, and is less prone to errors.
I believe that the answer is no it does not have an integer square root function and that your solution is reasonable.
I'm a bit surprised at the need to multiple by 1.0 to convert to a floating point value. I think that must be a Delphi bug and more recent versions certainly behave as you would wish.
This is the code I end up using, based on one of the algorhythms listed on wikipedia
type
baseint=UInt64;//or cardinal for the 32-bit version
function isqrt(x:baseint):baseint;
var
p,q:baseint;
begin
//get highest power of four
p:=0;
q:=4;
while (q<>0) and (q<=x) do
begin
p:=q;
q:=q shl 2;
end;
//
q:=0;
while p<>0 do
begin
if x>=p+q then
begin
dec(x,p);
dec(x,q);
q:=(q shr 1)+p;
end
else
q:=q shr 1;
p:=p shr 2;
end;
Result:=q;
end;
Normally, in Delphi one would declare a function with a variable number of arguments using the 'array of const' method. However, for compatibility with code written in C, there's an much-unknown 'varargs' directive that can be added to a function declaration (I learned this while reading Rudy's excellent 'Pitfalls of convering' document).
As an example, one could have a function in C, declared like this :
void printf(const char *fmt, ...)
In Delphi, this would become :
procedure printf(const fmt: PChar); varargs;
My question is : How can I get to the contents of the stack when implementing a method which is defined with the 'varargs' directive?
I would expect that some tooling for this exists, like Dephi translations of the va_start(), va_arg() and va_end() functions, but I can't find this anywhere.
Please help!
PS: Please don't drift off in discussions about the 'why' or the 'array of const' alternative - I need this to write C-like patches for functions inside Xbox games (see the Delphi Xbox emulator project 'Dxbx' on sourceforge for details).
OK, I see the clarification in your question to mean that you need to implement a C import in Delphi. In that case, you need to implement varargs yourself.
The basic knowledge needed is the C calling convention on the x86: the stack grows downwards, and C pushes arguments from right to left. Thus, a pointer to the last declared argument, after it is incremented by the size of the last declared argument, will point to the tail argument list. From then, it's simply a matter of reading the argument out and incrementing the pointer by an appropriate size to move deeper into the stack. The x86 stack in 32-bit mode is 4-byte aligned generally, and this also means that bytes and words are passed as 32-bit integers.
Anyhow, here's a helper record in a demo program that shows how to read out data. Note that Delphi seems to be passing Extended types in a very odd way; however, you likely won't have to worry about that, as 10-byte floats aren't generally widely used in C, and aren't even implemented in the latest MS C, IIRC.
{$apptype console}
type
TArgPtr = record
private
FArgPtr: PByte;
class function Align(Ptr: Pointer; Align: Integer): Pointer; static;
public
constructor Create(LastArg: Pointer; Size: Integer);
// Read bytes, signed words etc. using Int32
// Make an unsigned version if necessary.
function ReadInt32: Integer;
// Exact floating-point semantics depend on C compiler.
// Delphi compiler passes Extended as 10-byte float; most C
// compilers pass all floating-point values as 8-byte floats.
function ReadDouble: Double;
function ReadExtended: Extended;
function ReadPChar: PChar;
procedure ReadArg(var Arg; Size: Integer);
end;
constructor TArgPtr.Create(LastArg: Pointer; Size: Integer);
begin
FArgPtr := LastArg;
// 32-bit x86 stack is generally 4-byte aligned
FArgPtr := Align(FArgPtr + Size, 4);
end;
class function TArgPtr.Align(Ptr: Pointer; Align: Integer): Pointer;
begin
Integer(Result) := (Integer(Ptr) + Align - 1) and not (Align - 1);
end;
function TArgPtr.ReadInt32: Integer;
begin
ReadArg(Result, SizeOf(Integer));
end;
function TArgPtr.ReadDouble: Double;
begin
ReadArg(Result, SizeOf(Double));
end;
function TArgPtr.ReadExtended: Extended;
begin
ReadArg(Result, SizeOf(Extended));
end;
function TArgPtr.ReadPChar: PChar;
begin
ReadArg(Result, SizeOf(PChar));
end;
procedure TArgPtr.ReadArg(var Arg; Size: Integer);
begin
Move(FArgPtr^, Arg, Size);
FArgPtr := Align(FArgPtr + Size, 4);
end;
procedure Dump(const types: string); cdecl;
var
ap: TArgPtr;
cp: PChar;
begin
cp := PChar(types);
ap := TArgPtr.Create(#types, SizeOf(string));
while True do
begin
case cp^ of
#0:
begin
Writeln;
Exit;
end;
'i': Write(ap.ReadInt32, ' ');
'd': Write(ap.ReadDouble, ' ');
'e': Write(ap.ReadExtended, ' ');
's': Write(ap.ReadPChar, ' ');
else
Writeln('Unknown format');
Exit;
end;
Inc(cp);
end;
end;
type
PDump = procedure(const types: string) cdecl varargs;
var
MyDump: PDump;
function AsDouble(e: Extended): Double;
begin
Result := e;
end;
function AsSingle(e: Extended): Single;
begin
Result := e;
end;
procedure Go;
begin
MyDump := #Dump;
MyDump('iii', 10, 20, 30);
MyDump('sss', 'foo', 'bar', 'baz');
// Looks like Delphi passes Extended in byte-aligned
// stack offset, very strange; thus this doesn't work.
MyDump('e', 2.0);
// These two are more reliable.
MyDump('d', AsDouble(2));
// Singles passed as 8-byte floats.
MyDump('d', AsSingle(2));
end;
begin
Go;
end.
I found this (from a guy we know :))
To write this stuff properly you'll need to use BASM, Delphi's built in
assembler, and code the call sequence in asm. Hopefully you've got a good
idea of what you need to do. Perhaps a post in the .basm group will help if
you get stuck.
Delphi doesn't let you implement a varargs routine. It only works for importing external cdecl functions that use this.
Since varargs is based on the cdecl calling convention, you basically need to reimplement it yourself in Delphi, using assembly and/or various kinds of pointer manipulation.