Delphi Tokyo 64-bit flushes denormal numbers to zero? - delphi

During a short look at the source code of system.math, I discovered that
the 64-bit version Delphi Tokyo 10.2.3 flushes denormal IEEE-Doubles to zero, as can be seen from then following program;
{$apptype console}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(#twopm1030)^;
writeln(x);
x := ldexp(1,-515);
writeln(x*x);
x := ldexp(1,-1030);
writeln(x);
end.
For 32-bit the output is as expected
8.69169475979376E-0311
8.69169475979376E-0311
8.69169475979376E-0311
but with 64-bit I get
8.69169475979375E-0311
0.00000000000000E+0000
0.00000000000000E+0000
So basically Tokyo can handle denormal numbers in 64-bit mode, the constant is written correctly, but from arithmetic operations or even with ldexp a denormal result is flushed to zero.
Can this observation be confirmed on other systems? If yes, where it is documented? (The only info I could find about zero-flushing is,
that Denormals become zero when stored in a Real48).
Update: I know that for both 32- and 64-bit the single overload is used. For 32-bit the x87 FPU is used and the ASM code is virtually identical for all precisions (single, double, extended). The FPU always returns a 80-bit extended which is stored in a double without premature truncation. The 64-bit code does precision adjustment before storing.
Meanwhile I filed an issue report (https://quality.embarcadero.com/browse/RSP-20925), with the focus on the inconsistent results for 32- or 64-bit.

Update:
There is only a difference in how the compiler treats the overloaded selection.
As #Graymatter found out, the LdExp overload called is the Single type for both the 32-bit and the 64-bit compiler. The only difference is the codebase, where the 32-bit compiler is using asm code, while the 64-bit compiler has a purepascal implementation.
To fix the code to use the correct overload, explicitly define the type for the LdExp() first argument like this it works (64-bit):
program Project116;
{$APPTYPE CONSOLE}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(#twopm1030)^;
writeln(x);
x := ldexp(Double(1),-515);
writeln(x*x);
x := ldexp(Double(1),-1030);
writeln(x);
ReadLn;
end.
Outputs:
8.69169475979375E-0311
8.69169475979375E-0311
8.69169475979375E-0311
I would say that this behaviour should be reported as a RTL bug, since the overloaded function selected in your case is the Single type. The resulting type is a Double and the compiler should definitely adapt accordingly.
since the 32-bit and the 64-bit compiler should produce the same result.
Note, the Double(1) typecast for floating point types, was introduced in Delphi 10.2 Tokyo. For solutions in prevoius versions, see What is first version of Delphi which allows typecasts like double(10).

The problem here is that Ldexp(single) is returning different results depending on whether the ASM code is being called or whether the pascal code is called. In both cases, the compiler is calling the Single version of the overload because the type isn't specified in the call.
Your pascal code which is executed in the Win64 scenario tries to deal with the exponent less than -126 but the method is still not able to correctly calculate the result because single numbers are limited to an 8 bit exponent. The assembler seems to get around this but I didn't look into it in much detail as to why that's the case.
function Ldexp(const X: Single; const P: Integer): Single;
{ Result := X * (2^P) }
{$IFNDEF X86ASM}
var
T: Single;
I: Integer;
const
MaxExp = 127;
MinExp = -126;
FractionOfOne = $00800000;
begin
T := X;
Result := X;
case T.SpecialType of
fsDenormal,
fsNDenormal,
fsPositive,
fsNegative:
begin
FClearExcept;
I := P;
if I > MaxExp then
begin
T.BuildUp(False, FractionOfOne, MaxExp);
Result := Result * T;
I := I - MaxExp;
if I > MaxExp then I := MaxExp;
end
else if I < MinExp then
begin
T.BuildUp(False, FractionOfOne, MinExp);
Result := Result * T;
I := I - MinExp;
if I < MinExp then I := MinExp;
end;
if I <> 0 then
begin
T.BuildUp(False, FractionOfOne, I);
Result := Result * T;
end;
FCheckExcept;
end;
// fsZero,
// fsNZero,
// fsInf,
// fsNInf,
// fsNaN:
else
;
end;
end;
{$ELSE X86ASM}
{$IF defined(CPUX86) and defined(IOS)} // iOS/Simulator
...
{$ELSE}
asm // StackAlignSafe
PUSH EAX
FILD dword ptr [ESP]
FLD X
FSCALE
POP EAX
FSTP ST(1)
FWAIT
end;
{$ENDIF}
{$ENDIF X86ASM}
As LU RD suggested, you can get around the problem by forcing the methods to call the Double overload. There is a bug but that bug is that the ASM code doesn't match the pascal code in Ldexp(const X: Single; const P: Integer), not that a different overload is being called.

Related

64-bit Equivalent to BSWAP in X86 Image Handling Routine

I have a Delphi Firemonkey EXIF implementation I'm using in a routine to load image files. I'm trying to determine whether or not the image has been rotated, so I can correct the orientation of the image before displaying it. This routine, in part calls assembly code that executes a BSWAP to determine where header information in the image file is located. Here is a part of the code:
type
TMarker = packed record
Marker : Word; //Section marker
Len : Word; //Length Section
Indefin : Array [0..4] of Char; //Indefiner - "Exif" 00, "JFIF" 00 and ets
Pad : Char; //0x00
end;
TIFDHeader = packed record
pad : Byte; //00h
ByteOrder : Word; //II (4D4D) or MM
i42 : Word; //2A00 (magic number from the 'Hitchhikers Guide'
Offset : Cardinal; //0th offset IFD
Count : Word; // number of IFD entries
end;
function SwapLong(Value: Cardinal): Cardinal;
asm bswap eax end;
procedure TExif.ReadFromFile(const FileName: string);
var
j: TMarker;
ifd: TIFDHeader;
off0: Cardinal; //Null Exif Offset
SOI: Word; //2 bytes SOI marker. FF D8 (Start Of Image)
f: File;
begin
if not FileExists(FileName) then exit;
Init;
System.FileMode:=0; //Read Only open
AssignFile(f,FileName);
reset(f,1);
BlockRead(f,SOI,2);
if SOI=$D8FF then begin //Is this Jpeg
BlockRead(f,j,9);
if j.Marker=$E0FF then begin //JFIF Marker Found
Seek(f,20); //Skip JFIF Header
BlockRead(f,j,9);
end;
//Search Exif start marker;
if j.Marker<>$E1FF then begin
i:=0;
repeat
BlockRead(f,SOI,2); //Read bytes.
inc(i);
until (EOF(f) or (i>1000) or (SOI=$E1FF));
//If we find maker
if SOI=$E1FF then begin
Seek(f,FilePos(f)-2); //return Back on 2 bytes
BlockRead(f,j,9); //read Exif header
end;
end;
if j.Marker=$E1FF then begin //If we found Exif Section. j.Indefin='Exif'.
FValid:=True;
off0:=FilePos(f)+1; //0'th offset Exif header
BlockRead(f,ifd,11); //Read IDF Header
FSwap := ifd.ByteOrder=$4D4D; // II or MM - if MM we have to swap
if FSwap then begin
ifd.Offset := SwapLong(ifd.Offset);
ifd.Count := Swap(ifd.Count);
end;
if ifd.Offset <> 8 then begin
Seek(f, FilePos(f)+abs(ifd.Offset)-8);
end;
This works fine when the application is built for 32-bit Windows, but fails at the SwapLong call under 64-bit Windows. I don't know the first thing about Assembly language and so I'm looking for how to handle the same functionality when building the 64-bit version of the program. Just as a note, in both versions the idf.OffSet value passed to the SwapLong function is 134217728 ($08000000). In the 32-bit version the SwapLong returns a value of 8, but the 64-bit version returns a value of 2694969615 given what appears to be the same input.
I need the 64-bit version to work as I am looking to target 64-bit MAC OSX with the same code. Any help would be greatly appreciated.
The issue exists because the inline assembly assumes the first argument as well as the return value to be using register eax, which is true for Delphi in 32-bit mode as per Delphi's calling convention (and although the inline assembly documentation states that there shouldn't be made any assumptions about registers other than ebp and esp, this always held true even inside of inline assembly statements when they were placed at the top of a function).
However, 64-bit mode uses a different calling convention in which the first argument is in rcx and the return value is using rax. So here you are getting random uninitialized garbage as return value that happened to be in that register (with its bytes swapped) because it's never explicitly set.
The best, portable solution would be to implement the byte swap in pure Pascal without inline assembly:
function SwapLong(Value: Cardinal): Cardinal;
begin
Result := Swap(Value shr 16) or (Cardinal(Swap(Value)) shl 16);
end;
This uses the decades-old Swap function which swaps the lower 2 bytes of a value. This isn't of much use on its own anymore but it can be utilized twice (together with some bit shifting and masking) to shorten code for swapping all 4 bytes of a 32-bit value.
Another way which has more source code but can produce less convoluted assembly code as a result would be accessing the individual bytes in the Cardinal using byte pointers:
function SwapLong(Value: Cardinal): Cardinal; inline;
begin
PByte(#Result)^ := PByte(NativeUInt(#Value) + 3)^;
PByte(NativeUInt(#Result) + 1)^ := PByte(NativeUInt(#Value) + 2)^;
PByte(NativeUInt(#Result) + 2)^ := PByte(NativeUInt(#Value) + 1)^;
PByte(NativeUInt(#Result) + 3)^ := PByte(#Value)^;
end;
64-bit assembly passes parameters in different registers than 32-bit. In this case, parameter will be in ECX register, and return value needs to be in EAX.
That requires different code for 32-bit and 64-bit assembly.
function SwapLong(Value: Cardinal): Cardinal;
{$IFDEF ASSEMBLER}
{$IFDEF CPUX86}
asm
bswap eax
end;
{$ENDIF CPUX86}
{$IFDEF CPUX64}
asm
mov eax, ecx
bswap eax
end;
{$ENDIF CPUX64}
{$ELSE}
begin
// pascal version
end;
{$ENDIF}
Since inline assembly is only available on Windows, other platforms need pure pascal code as shown in CherryDT's answer

Different optimizations in Math.Sum in Win32/64

I have the following code
const
NumIterations = 10000000;
var
i, j : Integer;
x : array[1..100] of Double;
Start : Cardinal;
S : Double;
begin
for i := Low(x) to High(x) do x[i] := i;
Start := GetTickCount;
for i := 1 to NumIterations do S := System.Math.Sum(x);
ShowMessage('Math.Sum: ' + IntToStr(GetTickCount - Start));
Start := GetTickCount;
for i := 1 to NumIterations do begin
S := 0;
for j := Low(x) to High(x) do S := S + x[j];
end;
ShowMessage('Simple Sum: ' + IntToStr(GetTickCount - Start));
end;
When compiled for Win32 Math.Sum is considerably faster than the simple loop, as Math.Sum is written in Assembler and uses four-fold loop unrolling.
But when compiled for Win64, Math.Sum is considerably slower than the simple loop, because in 64-bit Math.Sum uses Kahan summation. This is an optimization for accuracy minimizing pile-up of errors during the summation process, but is considerably slower than even the simple loop.
I.e. when compiling for Win32 I get code optimized for speed, when compiling the same code for Win64 I get code optimized for accuracy. This is not exactly what I naively would expect.
Is there any sensible reason for this difference between Win32/64? Double is always 8 byte, so the accuracy should be identical in Win32/64.
Is Math.Sum still implemented identically (Assembler and loop unrolling in Win32, Kahan summation in Win64) in current versions of Delphi? I use Delphi-XE5.
Is Math.Sum still implemented identically (Assembler and loop unrolling in Win32, Kahan summation in Win64) in current versions of Delphi? I use Delphi-XE5.
Yes (Delphi 10.3.2).
Is there any sensible reason for this difference between Win32/64? Double is always 8 byte, so the accuracy should be identical in Win32/64.
32-bit Delphi for Win32 uses the old FPU, while the 64-bit compiler uses SSE instructions. When the 64-bit compiler was introduced in XE2, many of the old assembly routines was not ported to 64-bit. Instead, some routines were ported with similar functionality as other modern compilers.
You can enhance the 64-bit implementation a bit by introducing a Kahan summation function:
program TestKahanSum;
{$APPTYPE CONSOLE}
uses
System.SysUtils,Math,Diagnostics;
function KahanSum(const input : TArray<Double>): Double;
var
sum,c,y,t : Double;
i : Integer;
begin
sum := 0.0;
c := 0.0;
for i := Low(input) to High(input) do begin
y := input[i] - c;
t := sum + y;
c := (t - sum) - y;
sum := t;
end;
Result := sum;
end;
var
dArr : TArray<Double>;
res : Double;
i : Integer;
sw : TStopWatch;
begin
SetLength(dArr,100000000);
for i := 0 to High(dArr) do dArr[i] := Pi;
sw := TStopWatch.StartNew;
res := Math.Sum(dArr);
WriteLn('Math.Sum:',res,' [ms]:',sw.ElapsedMilliseconds);
sw := TStopWatch.StartNew;
res := KahanSum(dArr);
WriteLn('KahanSum:',res,' [ms]:',sw.ElapsedMilliseconds);
sw := TStopWatch.StartNew;
res := 0;
for i := 0 to High(dArr) do res := res + dArr[i];
WriteLn('NaiveSum:',res,' [ms]:',sw.ElapsedMilliseconds);
ReadLn;
end.
64-bit:
Math.Sum: 3.14159265358979E+0008 [ms]:492
KahanSum: 3.14159265358979E+0008 [ms]:359
NaiveSum: 3.14159265624272E+0008 [ms]:246
32-bit:
Math.Sum: 3.14159265358957E+0008 [ms]:67
KahanSum: 3.14159265358979E+0008 [ms]:958
NaiveSum: 3.14159265624272E+0008 [ms]:277
Pi with 15 digits is 3.14159265358979
The 32-bit math assembly routine is accurate to 13 digits in this example, while the 64-bit math routine is accurate to 15 digits.
Conclusion:
The 64 bit implementation is slower (by a factor of two compared to a naive summation), but more accurate than the 32-bit math routine.
Introducing an enhanced Kahan summation routine improves performance by 35%.
Having the very same RTL function not behave the same when switching a compilation target is an awful bug. It should not change the behavior. Even worse, Win64/pascal Sum() over single or double does not behave the same! sum(single) is naive summing, whereas sum(double) uses Kahan... :(
You would better either use plain + operator, or create your own Kahan sum function.
I can confirm that the bug is still there in Delphi 10.3.

FNV hash produces different result in 64 bit OS

I use FNV to hash a file in my 32 bit OS. Why the hash result of same code is different if I use the code in 64 bit OS?
Here's my code:
function gethash(const dwOffset: PByteArray; const dwLen: DWORD;
const offset_basis: DWORD): DWORD;
var
i: integer;
begin
Result := offset_basis;
try
{$R-}{$I-}
for i := 0 to dwLen - 1 do
Result := (Result * 16777619) xor DWORD(dwOffset^[i]);
{$R+}{$I+}
except
;
end;
end;
This code will produce the same output irrespective of the bitness of the operating system. Furthermore the output is independent of the bitness of the process. That is, if you compile for 32 bit and 64 bit, the output will be the same.
The logical conclusion therefore, is that the different output is caused by supplying different input.
One might ask why you have a swallow all exception handler in your code. That must be a really bad idea. If you supply incorrect parameters which lead to an access violation, you'll never find out. I urge you to remove that exception handler.
To demonstrate my point, I offer the following simple test program:
{$APPTYPE CONSOLE}
uses
Winapi.Windows,
System.SysUtils;
{$R-}{$I-}
function gethash(const dwOffset: PByteArray; const dwLen: DWORD;
const offset_basis: DWORD): DWORD;
var
i: integer;
begin
Result := offset_basis;
for i := 0 to dwLen - 1 do
Result := (Result * 16777619) xor DWORD(dwOffset^[i]);
end;
procedure Main;
var
i: Integer;
buf: TBytes;
begin
SetLength(buf, 666);
for i := 0 to high(buf) do
buf[i] := (i+13)*7 mod 256;
Writeln(gethash(PByteArray(buf), Length(buf), 17));
end;
begin
Main;
Readln;
end.
This produces the same output on every operating system, and the same output when compiled by either the 32 bit or 64 bit compiler. Whilst this does not prove that will be the case for every possible input, it gives you a test bed. You can replace my input with your own and you will discover, if I am right, that the output will always be the same, for fixed input.
One plausible explanation for your problem is that you are reading beyond the end of the buffer and so hashing ill-defined input data. And perhaps with a different OS or compiler, that ill-defined input data differs.

Passing an Ordinal Parameter

Is it possible to write a method which takes any ordinal type as a parameter? The same way Inc() or High() do?
I'm using Delphi 2007
You'd need to use an untyped parameter:
procedure Foo(const ordinal);
or
procedure Foo(var ordinal);
Of course, you are somewhat limited in what you could do inside such a routine because you have abandoned the type system.
Found a possible way, might not be what you expect, but hey, I found a way! Use Variants. The problem with passing typeless parameters to a procedure is that you get a plain pointer, no type information, so you can't do anything useful with it. Bytes are 1 byte, enums of up to 256 elements are 1 byte byte, enums of up to 2^16 elements are 2 bytes, integers are 4 bytes (unless they're 8). But there is one type that allows anything to be passed and cares enough type information to make things work: the Variant. I intentionally wrote the following example in Delphi 7, to make sure I don't accidentally use any Delphi 2010 or Delphi XE goodness.
Edit: Updated the code sample to handle any type that's considered Ordinal by the Variants.VarTypeIsOrdinal. That includes all integer types + Boolean. Apparently Enum is seen as Byte, so it swallows that too.
program Project1;
{$APPTYPE CONSOLE}
uses
ExceptionLog,
SysUtils, Variants;
type TSomeEnum = (e0, e1, e2, e3, e4);
procedure DoSomethingWithEnum(V: Variant);
var i: Integer;
b: Byte;
lw: LongWord; // Cardinal!
i64: Integer;
begin
case VarType(V) of
varInt64:
begin
i64 := V;
WriteLn(i64);
end;
varSmallint, varInteger, varShortInt:
begin
i := V;
WriteLn(i);
end;
varByte:
begin
b := V;
WriteLn(b);
end;
varWord, varLongWord:
begin
lw := V;
WriteLn(lw);
end;
varBoolean:
begin
if V then WriteLn('True') else WriteLn('False');
end;
else WriteLn('NOT a variant type (type = #' + IntToStr(Ord(VarType(V))));
end;
end;
var i: Integer;
b: Byte;
c: Cardinal;
enum: TSomeEnum;
w: Word;
si: Shortint;
begin
i := 1;
b := 2;
c := 3;
enum := e4;
w := 5;
si := -6;
DoSomethingWithEnum(i);
DoSomethingWithEnum(b);
DoSomethingWithEnum(c);
DoSomethingWithEnum(enum);
DoSomethingWithEnum(True);
DoSomethingWithEnum(w);
DoSomethingWithEnum(si);
Readln;
end.
The reason why doing this is difficult is that Inc(x), Dec(x) and others like Pred(x) and Succ(x) are actually, generated by the compiler, and are, if you like, merely Function style syntax sugar over an inherent compiler operation.
You can, as some people suggest, do some of this with overloading, some of it with clever use of generics, and some of it with variants. But nothing will be as convenient for emulating these functions, or exactly the same functionally.
The compiler implements Inc() for example, for all ordered types, including Enums, Integers, and subranges on those types (a now rather obscure feature of classic "Wirth" Pascal is that all types can have subranges defined on those types).
If you actually told us more about what you were doing, it might be possible to get closer. But the general answer is, No, there isn't even source code for Inc, and Dec, because these are compiler primitives. If there was RTL source code to the function Inc, you could go look at it, and adapt it.
inc(x) could be defined as x := Succ(x), but then, how do you define Succ(x)? As x := Inc(x)? You see. At some point, compiler "magic" takes over.

How can a function with 'varargs' retrieve the contents of the stack?

Normally, in Delphi one would declare a function with a variable number of arguments using the 'array of const' method. However, for compatibility with code written in C, there's an much-unknown 'varargs' directive that can be added to a function declaration (I learned this while reading Rudy's excellent 'Pitfalls of convering' document).
As an example, one could have a function in C, declared like this :
void printf(const char *fmt, ...)
In Delphi, this would become :
procedure printf(const fmt: PChar); varargs;
My question is : How can I get to the contents of the stack when implementing a method which is defined with the 'varargs' directive?
I would expect that some tooling for this exists, like Dephi translations of the va_start(), va_arg() and va_end() functions, but I can't find this anywhere.
Please help!
PS: Please don't drift off in discussions about the 'why' or the 'array of const' alternative - I need this to write C-like patches for functions inside Xbox games (see the Delphi Xbox emulator project 'Dxbx' on sourceforge for details).
OK, I see the clarification in your question to mean that you need to implement a C import in Delphi. In that case, you need to implement varargs yourself.
The basic knowledge needed is the C calling convention on the x86: the stack grows downwards, and C pushes arguments from right to left. Thus, a pointer to the last declared argument, after it is incremented by the size of the last declared argument, will point to the tail argument list. From then, it's simply a matter of reading the argument out and incrementing the pointer by an appropriate size to move deeper into the stack. The x86 stack in 32-bit mode is 4-byte aligned generally, and this also means that bytes and words are passed as 32-bit integers.
Anyhow, here's a helper record in a demo program that shows how to read out data. Note that Delphi seems to be passing Extended types in a very odd way; however, you likely won't have to worry about that, as 10-byte floats aren't generally widely used in C, and aren't even implemented in the latest MS C, IIRC.
{$apptype console}
type
TArgPtr = record
private
FArgPtr: PByte;
class function Align(Ptr: Pointer; Align: Integer): Pointer; static;
public
constructor Create(LastArg: Pointer; Size: Integer);
// Read bytes, signed words etc. using Int32
// Make an unsigned version if necessary.
function ReadInt32: Integer;
// Exact floating-point semantics depend on C compiler.
// Delphi compiler passes Extended as 10-byte float; most C
// compilers pass all floating-point values as 8-byte floats.
function ReadDouble: Double;
function ReadExtended: Extended;
function ReadPChar: PChar;
procedure ReadArg(var Arg; Size: Integer);
end;
constructor TArgPtr.Create(LastArg: Pointer; Size: Integer);
begin
FArgPtr := LastArg;
// 32-bit x86 stack is generally 4-byte aligned
FArgPtr := Align(FArgPtr + Size, 4);
end;
class function TArgPtr.Align(Ptr: Pointer; Align: Integer): Pointer;
begin
Integer(Result) := (Integer(Ptr) + Align - 1) and not (Align - 1);
end;
function TArgPtr.ReadInt32: Integer;
begin
ReadArg(Result, SizeOf(Integer));
end;
function TArgPtr.ReadDouble: Double;
begin
ReadArg(Result, SizeOf(Double));
end;
function TArgPtr.ReadExtended: Extended;
begin
ReadArg(Result, SizeOf(Extended));
end;
function TArgPtr.ReadPChar: PChar;
begin
ReadArg(Result, SizeOf(PChar));
end;
procedure TArgPtr.ReadArg(var Arg; Size: Integer);
begin
Move(FArgPtr^, Arg, Size);
FArgPtr := Align(FArgPtr + Size, 4);
end;
procedure Dump(const types: string); cdecl;
var
ap: TArgPtr;
cp: PChar;
begin
cp := PChar(types);
ap := TArgPtr.Create(#types, SizeOf(string));
while True do
begin
case cp^ of
#0:
begin
Writeln;
Exit;
end;
'i': Write(ap.ReadInt32, ' ');
'd': Write(ap.ReadDouble, ' ');
'e': Write(ap.ReadExtended, ' ');
's': Write(ap.ReadPChar, ' ');
else
Writeln('Unknown format');
Exit;
end;
Inc(cp);
end;
end;
type
PDump = procedure(const types: string) cdecl varargs;
var
MyDump: PDump;
function AsDouble(e: Extended): Double;
begin
Result := e;
end;
function AsSingle(e: Extended): Single;
begin
Result := e;
end;
procedure Go;
begin
MyDump := #Dump;
MyDump('iii', 10, 20, 30);
MyDump('sss', 'foo', 'bar', 'baz');
// Looks like Delphi passes Extended in byte-aligned
// stack offset, very strange; thus this doesn't work.
MyDump('e', 2.0);
// These two are more reliable.
MyDump('d', AsDouble(2));
// Singles passed as 8-byte floats.
MyDump('d', AsSingle(2));
end;
begin
Go;
end.
I found this (from a guy we know :))
To write this stuff properly you'll need to use BASM, Delphi's built in
assembler, and code the call sequence in asm. Hopefully you've got a good
idea of what you need to do. Perhaps a post in the .basm group will help if
you get stuck.
Delphi doesn't let you implement a varargs routine. It only works for importing external cdecl functions that use this.
Since varargs is based on the cdecl calling convention, you basically need to reimplement it yourself in Delphi, using assembly and/or various kinds of pointer manipulation.

Resources