SHR on int64 does not return expected result - delphi

I'm porting some C# code to Delphi (XE5). The C# code has code like this:
long t = ...
...
t = (t >> 25) + ...
I translated this to
t: int64;
...
t := (t shr 25) + ...
Now I see that Delphi (sometimes) calculates wrong values for shifting negative t's, e.g.:
-170358640930559629 shr 25
Windows Calculator: -5077083139
C# code: -5077083139
Delphi:
-170358640930559629 shr 25 = 544678730749 (wrong)
For this example, -1*((-t shr 25)+1) gives the correct value in Delphi.
For other negative values of t a simple typecast to integer seems to give the correct result:
integer(t shr 25)
I am at my limit regarding binary operations and representations, so I would appreciate any help with simply getting the same results in Delphi like in C# and Windows calculator.

Based on the article linked in Filipe's answer (which states the reason to be Delphi carrying out a shr as opposed to others doing a sar), here's my take on this:
function CalculatorRsh(Value: Int64; ShiftBits: Integer): Int64;
begin
Result := Value shr ShiftBits;
if (Value and $8000000000000000) > 0 then
Result := Result or ($FFFFFFFFFFFFFFFF shl (64 - ShiftBits));
end;

As you can read here, the way C and Delphi treat Shr is different. Not meaning to point fingers, but C's >> isn't really a shr, it's actually a sar.
Anyways, the only workaround that I've found is doing your math manually. Here's an example:
function SAR(a, b : int64): int64;
begin
result := round(a / (1 shl b));
end;
Hope it helps!

Related

Comparing float with small integer

If I have
f: Single;
F := 0;
if F <> 0 then raise exception.create('xxx');
does this comparison will work fine in any platforms? I mean do I will need to do round(f) <> 0 in some platforms? I know that on Windows doing F <> 0 is fine because 0 is an integer but I m curious for other platforms
In the title, you ask for a general answer, and in the body, you ask for a specific case. I'm not sure which answer you are trully interested in. But as a general case, the answer is "It depends".
As other have commented, your specific example will never raise, but it does not mean it's safe to compare a float to 0.
Take this exemple :
procedure TForm5.Button1Click(Sender: TObject);
var
F: single;
begin
F := (7 / 10);
F := F - 0.7;
if F <> 0 then
raise Exception.Create('Error Message');
end;
This will (as far as I know) always raise.
Also, round(f) <> 0 wouldn't be the way to go about this. Comparevalue(F, 0, ????) <> EqualsValue would be.
As to the "why" of all this, this has been answered (probably numerous times) on SO. (you can start here)

Delphi fastest FileSize for sizes > 10gb

Wanted to check with you experts if there are any drawbacks in this funtion. Will it work properly on the various Windows OS ? I am using Delphi Seattle (32 and 64 bit exe's). I am using this instead of Findfirst for its speed.
function GetFileDetailsFromAttr(pFileName:WideString):int64;
var
wfad: TWin32FileAttributeData;
wSize:LARGE_INTEGER ;
begin
Result:=0 ;
if not GetFileAttributesEx(pwidechar(pFileName), GetFileExInfoStandard,#wfad) then
exit;
wSize.HighPart:=wfad.nFileSizeHigh ;
wSize.LowPart:=wfad.nFileSizeLow ;
result:=wsize.QuadPart ;
end;
The typical googled samples shown with this command does not work for filesize > 9GB
function GetFileAttributesEx():Int64 using
begin
...
result:=((&wfad.nFileSizeHigh) or (&wfad.nFileSizeLow))
Code with variant record is correct.
But this code
result:=((&wfad.nFileSizeHigh) or (&wfad.nFileSizeLow))
is just wrong, result cannot overcome 32-bit border
Code from link in comment
result := Int64(info.nFileSizeLow) or Int64(info.nFileSizeHigh shl 32);
is wrong because it does not account how compiler works with 32 and 64-bit values. Look at the next example showing how to treat this situation properly (for value d, e):
var
a, b: DWord;
c, d, e: Int64;
wSize:LARGE_INTEGER ;
begin
a := 1;
b := 1;
c := Int64(a) or Int64(b shl 32);
d := Int64(a) or Int64(b) shl 32;
wSize.LowPart := a;
wSize.HighPart := b;
e := wsize.QuadPart;
Caption := Format('$%x $%x $%x', [c, d, e]);
Note that in the expression for c 32-bit value is shifted by 32 bits left and looses set bit, then zero transforms to 64-bit.
Unbound to how you get the filesize: it would even be faster if you'd use a type (manual) that exists for ~25 years already to assign the filesize directly to the function's result instead of using an intermediate variable:
Int64Rec(result).Hi:= wfad.nFileSizeHigh;
Int64Rec(result).Lo:= wfad.nFileSizeLow;
end;
In case this isn't obvious to anyone here's what the compilation looks like:
Above: the intermediate variable w: LARGE_INTEGER first gets assigned the two 32bit parts and then is assigned itself to the function's result. Cost: 10 instructions.
Above: the record Int64Rec is used to cast the function's result and assign both 32bit parts directly, without the need of any other variable. Cost: 6 instructions.
Environment used: Delphi 7.0 (Build 8.1), compiler version 15.0, Win32 executable, code optimization: on.

Combine two bytes in delphi

Provided that I have two bytes variables in Delphi:
var
b1,b2,result:byte;
begin
b1:=$05;
b2:=$04;
result:=??? // $54
end;
How would I then combine the two to produce a byte of value $54?
The most trivial way is
result := b1 * $10 + b2
"Advanced" way:
result := b1 shl 4 + b2
The best way would be to:
interface
function combine(a,b: integer): integer; inline; //allows inlining in other units
implementation
function combine(a,b: cardinal): cardinal; inline;
begin
Assert((a <= $f));
Assert((b <= $f));
Result:= a * 16 + b;
end;
Working with byte registers slows down the processor due to partial register stalls.
The asserts will get eliminated in release mode.
If performance matters never use anything but integers (or cardinals).
I have no idea why people are talking about VMT's or dll's. It's a simple inline method that does not even generate a call.

How can I bit-reflect a byte in Delphi?

Is there an easy way to bit-reflect a byte variable in Delphi so that the most significant bit (MSB) gets the least significant bit (LSB) and vice versa?
In code you can do it like this:
function ReverseBits(b: Byte): Byte;
var
i: Integer;
begin
Result := 0;
for i := 1 to 8 do
begin
Result := (Result shl 1) or (b and 1);
b := b shr 1;
end;
end;
But a lookup table would be much more efficient, and only consume 256 bytes of memory.
function ReverseBits(b: Byte): Byte; inline;
const
Table: array [Byte] of Byte = (
0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240,
8,136,72,200,40,168,104,232,24,152,88,216,56,184,120,248,
4,132,68,196,36,164,100,228,20,148,84,212,52,180,116,244,
12,140,76,204,44,172,108,236,28,156,92,220,60,188,124,252,
2,130,66,194,34,162,98,226,18,146,82,210,50,178,114,242,
10,138,74,202,42,170,106,234,26,154,90,218,58,186,122,250,
6,134,70,198,38,166,102,230,22,150,86,214,54,182,118,246,
14,142,78,206,46,174,110,238,30,158,94,222,62,190,126,254,
1,129,65,193,33,161,97,225,17,145,81,209,49,177,113,241,
9,137,73,201,41,169,105,233,25,153,89,217,57,185,121,249,
5,133,69,197,37,165,101,229,21,149,85,213,53,181,117,245,
13,141,77,205,45,173,109,237,29,157,93,221,61,189,125,253,
3,131,67,195,35,163,99,227,19,147,83,211,51,179,115,243,
11,139,75,203,43,171,107,235,27,155,91,219,59,187,123,251,
7,135,71,199,39,167,103,231,23,151,87,215,55,183,119,247,
15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255
);
begin
Result := Table[b];
end;
This is more than 10 times faster than the version of the code that operates on individual bits.
Finally, I don't normally like to comment too negatively on accepted answers when I have a competing answer. In this case there are very serious problems with the answer that you accepted that I would like to state clearly for you and also for any future readers.
You accepted #Arioch's answer at the time when it contained the same Pascal code as can be seen in this answer, together with two assembler versions. It turns out that those assembler versions are much slower than the Pascal version. They are twice as slow as the Pascal code.
It is a common fallacy that converting high level code to assembler results in faster code. If you do it badly then you can easily produce code that runs more slowly than the code emitted by the compiler. There are times when it is worth writing code in assembler but you must not ever do so without proper benchmarking.
What is particularly egregious about the use of assembler here is that it is so obvious that the table based solution will be exceedingly fast. It's hard to imagine how that could be significantly improved upon.
function BitFlip(B: Byte): Byte;
const
N: array[0..15] of Byte = (0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
begin
Result := N[B div 16] or N[B mod 16] shl 4;
end;
function ByteReverseLoop(b: byte): byte;
var i: integer;
begin
Result := 0; // actually not needed, just to make compiler happy
for i := 1 to 8 do
begin
Result := Result shl 1;
if Odd(b) then Result := Result or 1;
b := b shr 1;
end;
end;
If speed is important, then you can use lookup table. You feel it once on program start and then you just take a value from table. Since you're only needing to map byte to byte, that would take 256x1=256 bytes of memory. And given recent Delphi versions support inline functions, that would provide for both speed, readability and reliability (incapsulating array lookup in the function you may be sure you would not change the values due to some typo)
Var ByteReverseLUT: array[byte] of byte;
function ByteReverse(b: byte): byte; inline;
begin Result := ByteReverseLUT[b] end;
{Unit/program initialization}
var b: byte;
for b := Low(ByteReverseLUT) to High(ByteReverseLUT)
do ByteReverseLUT[b] := ByteReverseLoop(b);
Speed comparison of several implementations that were mentioned on this forum.
AMD Phenom2 x710 / Win7 x64 / Delphi XE2 32-bit {$O+}
Pascal AND original: 12494
Pascal AND reversed: 33459
Pascal IF original: 46829
Pascal IF reversed: 45585
Asm SHIFT 1: 15802
Asm SHIFT 2: 15490
Asm SHIFT 3: 16212
Asm AND 1: 19408
Asm AND 2: 19601
Asm AND 3: 19802
Pascal AND unrolled: 10052
Asm Shift unrolled: 4573
LUT, called: 3192
Pascal math, called: 4614
http://pastebin.ca/2304708
Note: LUT (lookup table) timings are probably rather optimistic here. Due to running in tight loop the whole table was sucked into L1 CPU cache. In real computations this function most probably would be called much less frequently and L1 cache would not keep the table entirely.
Pascal inlined function calls result are bogus - Delphi did not called them, detecting they had no side-effects. But funny - the timings were different.
Asm Shift unrolled: 4040
LUT, called: 3011
LUT, inlined: 977
Pascal unrolled: 10052
Pas. unrolled, inlined: 849
Pascal math, called: 4614
Pascal math, inlined: 6517
And below the explanation:
Project1.dpr.427: d := BitFlipLUT(i)
0044AC45 8BC3 mov eax,ebx
0044AC47 E89CCAFFFF call BitFlipLUT
Project1.dpr.435: d := BitFlipLUTi(i)
Project1.dpr.444: d := MirrorByte(i);
0044ACF8 8BC3 mov eax,ebx
0044ACFA E881C8FFFF call MirrorByte
Project1.dpr.453: d := MirrorByteI(i);
0044AD55 8BC3 mov eax,ebx
Project1.dpr.460: d := MirrorByte7Op(i);
0044ADA3 8BC3 mov eax,ebx
0044ADA5 E8AEC7FFFF call MirrorByte7Op
Project1.dpr.462: d := MirrorByte7OpI(i);
0044ADF1 0FB6C3 movzx eax,bl
All calls to inlined functions were eliminated.
Yet about passing the parameters Delphi made three different decisions:
For the 1st call it eliminated parameter passing together with function call
For the 2nd call it kept parameter passing, despite function was not called
For the 3rd call it kept changed parameter passing, which proved longer then function call itself! Weird! :-)
Using brute force can be simple and effective.
This routine is NOT on par with David's LUT solution.
Update
Added array of byte as input and result assigned to array of byte as well.
This shows better performance for the LUT solution.
function MirrorByte(b : Byte) : Byte; inline;
begin
Result :=
((b and $01) shl 7) or
((b and $02) shl 5) or
((b and $04) shl 3) or
((b and $08) shl 1) or
((b and $10) shr 1) or
((b and $20) shr 3) or
((b and $40) shr 5) or
((b and $80) shr 7);
end;
Update 2
Googling a little, found BitReverseObvious.
function MirrorByte7Op(b : Byte) : Byte; inline;
begin
Result :=
{$IFDEF WIN64} // This is slightly better in x64 than the code in x32
(((b * UInt64($80200802)) and UInt64($0884422110)) * UInt64($0101010101)) shr 32;
{$ENDIF}
{$IFDEF WIN32}
((b * $0802 and $22110) or (b * $8020 and $88440)) * $10101 shr 16;
{$ENDIF}
end;
This one is closer to the LUT solution, even faster in one test.
To sum up, MirrorByte7Op() is 5-30% slower than LUT in 3 of the tests, 5% faster in one test.
Code to benchmark:
uses
System.Diagnostics;
const
cBit : Byte = $AA;
cLoopMax = 1000000000;
var
sw : TStopWatch;
arrB : array of byte;
i : Integer;
begin
SetLength(arrB,cLoopMax);
for i := 0 TO Length(arrB) - 1 do
arrB[i]:= System.Random(256);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := b;
end;
sw.Stop;
WriteLn('Loop ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := ReflectBits(arrB[i]);
end;
sw.Stop;
WriteLn('RB array in: ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := MirrorByte(arrB[i]);
end;
sw.Stop;
WriteLn('MB array in: ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := MirrorByte7Op(arrB[i]);
end;
sw.Stop;
WriteLn('MB7Op array in : ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i] := ReflectBits(arrB[i]);
end;
sw.Stop;
WriteLn('RB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i]:= MirrorByte(arrB[i]);
end;
sw.Stop;
WriteLn('MB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i]:= MirrorByte7Op(arrB[i]);
end;
sw.Stop;
WriteLn('MB7Op array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
ReadLn;
end.
Result of benchmark (XE3, i7 CPU 870):
32 bit 64 bit
--------------------------------------------------
Byte assignment (= empty loop) 599 ms 2117 ms
MirrorByte to byte, array in 6991 ms 8746 ms
MirrorByte7Op to byte, array in 1384 ms 2510 ms
ReverseBits to byte, array in 945 ms 2119 ms
--------------------------------------------------
ReverseBits array in/out 1944 ms 3721 ms
MirrorByte7Op array in/out 1790 ms 3856 ms
BitFlipNibble array in/out 1995 ms 6730 ms
MirrorByte array in/out 7157 ms 8894 ms
ByteReverse array in/out 38246 ms 42303 ms
I added some of the other proposals in the last part of the table (all inlined). It is probably most fair to test in a loop with an array in and an array as result. ReverseBits (LUT) and MirrorByte7Op are comparable in speed followed by BitFlipNibble (LUT) which underperforms a bit in x64.
Note: I added a new algorithm for the x64 bit part of MirrorByte7Op. It makes better use of the 64 bit registers and has fewer instructions.

Does Delphi have isqrt?

I'm doing some heavy work on large integer numbers in UInt64 values, and was wondering if Delphi has an integer square root function.
Fow now I'm using Trunc(Sqrt(x*1.0)) but I guess there must be a more performant way, perhaps with a snippet of inline assembler? (Sqrt(x)with x:UInt64 throws an invalid type compiler error in D7, hence the *1.0 bit.)
I am very far from an expert on assembly, so this answer is just me fooling around.
However, this seems to work:
function isqrt(const X: Extended): integer;
asm
fld X
fsqrt
fistp #Result
fwait
end;
as long as you set the FPU control word's rounding setting to "truncate" prior to calling isqrt. The easiest way might be to define the helper function
function SetupRoundModeForSqrti: word;
begin
result := Get8087CW;
Set8087CW(result or $600);
end;
and then you can do
procedure TForm1.FormCreate(Sender: TObject);
var
oldCW: word;
begin
oldCW := SetupRoundModeForSqrti; // setup CW
// Compute a few million integer square roots using isqrt here
Set8087CW(oldCW); // restore CW
end;
Test
Does this really improve performance? Well, I tested
procedure TForm1.FormCreate(Sender: TObject);
var
oldCW: word;
p1, p2: Int64;
i: Integer;
s1, s2: string;
const
N = 10000000;
begin
oldCW := SetupRoundModeForSqrti;
QueryPerformanceCounter(p1);
for i := 0 to N do
Tag := isqrt(i);
QueryPerformanceCounter(p2);
s1 := inttostr(p2-p1);
QueryPerformanceCounter(p1);
for i := 0 to N do
Tag := trunc(Sqrt(i));
QueryPerformanceCounter(p2);
s2 := inttostr(p2-p1);
Set8087CW(oldCW);
ShowMessage(s1 + #13#10 + s2);
end;
and got the result
371802
371774.
Hence, it is simply not worth it. The naive approach trunc(sqrt(x)) is far easier to read and maintain, has superior future and backward compatibility, and is less prone to errors.
I believe that the answer is no it does not have an integer square root function and that your solution is reasonable.
I'm a bit surprised at the need to multiple by 1.0 to convert to a floating point value. I think that must be a Delphi bug and more recent versions certainly behave as you would wish.
This is the code I end up using, based on one of the algorhythms listed on wikipedia
type
baseint=UInt64;//or cardinal for the 32-bit version
function isqrt(x:baseint):baseint;
var
p,q:baseint;
begin
//get highest power of four
p:=0;
q:=4;
while (q<>0) and (q<=x) do
begin
p:=q;
q:=q shl 2;
end;
//
q:=0;
while p<>0 do
begin
if x>=p+q then
begin
dec(x,p);
dec(x,q);
q:=(q shr 1)+p;
end
else
q:=q shr 1;
p:=p shr 2;
end;
Result:=q;
end;

Resources