Delphi wrong double precision calculation

Delphi wrong double precision calculation - delphi

I am having a problem calculating a simple arithmetic equation using double precision variables.
I have a component that has a property Value which is of double precision, and I am setting this property to 100.
Then I am doing a simple subtraction to check if this value is really 100:
var
check: double;
begin
check:= 100 - MyComponent.Value
showmessage(floattostr(check));
end;
The problem is that I don't get zero, I get -1.4210854715202E-14, which is a problem because I the program checks if this result is exactly zero
any idea how to solve it?

Although you claim otherwise, the value returned by MyComponent.Value is clearly not exactly equal to 100. If it was, then 100 - MyComponent.Value would be exactly equal to 0. We can say that because 100 is exactly representable in binary floating point.
It's easy enough to see that 100.0 - 100.0 = 0.0.
var
x, y: Double;
....
x := 100.0;
y := 100.0;
Assert(x-y=0.0);
You will find, in your scenario, that
MyComponent.Value = 100.0
evaluates as False.
In general, it can always be a dangerous thing to try to compare floating point values exactly. Particularly if the values are the result of arithmetic operations, then the inherent imprecision of floating point operations will mean that exact comparison will often not give the results that you expect.
I hypothesise that MyComponent.Value actually performs arithmetic rather than, as you claim, returning 100.0.
Sometimes the best way to check for equality with floating point values is to check for approximate equality. For example, abs(x-y)<tol where tol is some small number. The difficulty with that is that it can be hard to come up with a robust choice of tol.
Exactly how you should implement this test is hard to say without knowing more detail.

When using floating point numbers, you should never perform exact compares; always use a small Epsilon value as explained in The Floating-Point Guide - What Every Programmer Should Know.
Note: I deliberately state this in an absolute sense. Of course there
are exceptions, but in practice these are usually exceptional. Of
course you might be in the situation that your problem domain is
exceptional, and warrants exact compares. In the vast majority of code
that I have maintained, this is not the case.
The Math unit (it has been with Delphi for a long long time) contains the below functions that handle Epsilon values for you.
When you pass an Epsilon of zero (i.e. 0.0) or no value at all to the functions below, they will estimate an reasonable value using these constants.
Note:
The appropriate value of Epsilon you want to use depends on the
calculations you use: sometimes inaccuracies can accumulate to much
larger values than these constants.
const
FuzzFactor = 1000;
SingleResolution = 1E-7 * FuzzFactor;
DoubleResolution = 1E-15 * FuzzFactor;
{$IFDEF EXTENDEDIS10BYTES}
ExtendedResolution = 1E-19 * FuzzFactor;
{$ELSE EXTENDEDIS10BYTES}
ExtendedResolution = DoubleResolution;
{$ENDIF EXTENDEDIS10BYTES}
The functions:
function CompareValue(const A, B: Extended; Epsilon: Extended = 0): TValueRelationship; overload;
function CompareValue(const A, B: Double; Epsilon: Double = 0): TValueRelationship; overload;
function CompareValue(const A, B: Single; Epsilon: Single = 0): TValueRelationship; overload;
function SameValue(const A, B: Extended; Epsilon: Extended = 0): Boolean; overload;
function SameValue(const A, B: Double; Epsilon: Double = 0): Boolean; overload;
function SameValue(const A, B: Single; Epsilon: Single = 0): Boolean; overload;
function IsZero(const A: Extended; Epsilon: Extended = 0): Boolean; overload;
function IsZero(const A: Double; Epsilon: Double = 0): Boolean; overload;
function IsZero(const A: Single; Epsilon: Single = 0): Boolean; overload;

Suggest you never check for zero by simple subtraction when one of the numbers is a float. Instead, use the IsZero function with whatever precision you want (e.g., 0.00001) for the Epsilon parameter.
procedure CheckFor100(MyPrecision);
begin
if IsZero(100 - MyComponent.Value, MyPrecision) then
ShowMessage('MyComponent value was 100')
else
ShowMessage('MyComponent value was not 100');
end;
Alternatively, you may wish to consider the SameValue function. I think both IsZero and SameValue are in the Math unit.

Related

To what accuracy are the local variables displayed in the Embarcadero RAD Studio XE2 debugger? Apparently 1 is not equal to 1

Take the following record:
TVector2D = record
public
class operator Equal(const V1, V2: TVector2D): Boolean;
class operator Multiply(const D: Accuracy; const V: TVector2D): TVector2D;
class operator Divide(const V: TVector2D; const D: Accuracy): TVector2D;
class function New(const x, y: Accuracy): TVector2D; static;
function Magnitude: Accuracy;
function Normalised: TVector2D;
public
x, y: Accuracy;
end;
With the methods defined as:
class operator TVector2D.Equal(const V1, V2: TVector2D): Boolean;
var
A, B: Boolean;
begin
Result := (V1.x = V2.x) and (V1.y = V2.y);
end;
class operator TVector2D.Multiply(const D: Accuracy; const V: TVector2D): TVector2D;
begin
Result.x := D*V.x;
Result.y := D*V.y;
end;
class operator TVector2D.Divide(const V: TVector2D; const D: Accuracy): TVector2D;
begin
Result := (1.0/D)*V;
end;
class function TVector2D.New(const x, y: Accuracy): TVector2D;
begin
Result.x := x;
Result.y := y;
end;
function TVector2D.Magnitude;
begin
RESULT := Sqrt(x*x + y*y);
end;
function TVector2D.Normalised: TVector2D;
begin
Result := Self/Magnitude;
end;
and a constant:
const
jHat2D : TVector2D = (x: 0; y: 1);
I would expect the Boolean value of (jHat2D = TVector2D.New(0,0.707).Normalised) to be True. Yet it comes out as False.
In the debugger TVector2D.New(0,0.707).Normalised.y shows as exactly 1.
It cannot be the case that this is exactly 1, otherwise the Boolean value of (jHat2D = TVector2D.New(0,0.707).Normalised) would be True.
Any ideas?
Edit
Accuracy is a Type defined as: Accuracy = Double

Assuming that Accuracy is a synonym for a Double type, this is a bug in the visualization of floating point values by the debugger. Due to the inherent problems with internal representation of floating points, v1.Y and v2.Y have very slightly different values, though both approximate to 1.
Add watches for v1.y and v2.y. Ensure that these watch values are configured to represent as "Floating Point" values with Digits set to 18 for maximum detail.
At your breakpoint you will see that:
v1.y = 1
v2.y = 0.999999999999999889
(whosrdaddy provided the above short version in the comments on the question, but I am retaining the long form of my investigation - see below the line after Conclusion - as it may prove useful in other, similar circumstances as well as being of potential interest)
Conclusion
Whilst the debugger visualizations are strictly speaking incorrect (or at best misleading), they are never-the-less very almost correct. :)
The question then is whether you require strict accuracy or accuracy to within a certain tolerance. If the latter then you can adopt the use of SameValue() with an EPSILON defined suitable to the degree of accuracy you require.
Otherwise you must accept that when debugging your code you cannot rely on the debugger to represent the values involved in your debugging to the degree of accuracy relied on in the code itself.
Option: Customise the Debug Visualization Itself
Alternatively you may wish to investigate creating a custom debug visualisation for your TVector2D type to represent your x/y values to the accuracy employed in your code.
For such a visualization using FloatToStr(), use Format() with a %f format specifier with a suitable number of decimal places. e.g. the below call yields the result obtained by watching the variable as described above:
Format('%.18f', [v2.y]);
// Yields 0.999999999999999889
Long Version of Original Investigation
I modified the Equal operator to allow me to inspect the internal representation of the two values v1.y and v2.y:
type
PAccuracy = Accuracy;
class operator TVector2D.Equal(const V1, V2: TVector2D): Boolean;
var
A, B: Boolean;
ay, by: PAccuracy;
begin
ay := #V1.y;
by := #V2.y;
A := (V1.x = V2.x);
B := (V1.y = V2.y);
result := A and B;
end;
By setting watches in the debugger to provide a Memory Dump of ay^ and by^ we see that the two values are represented internally very differently:
v1.y : $3f f0 00 00 00 00 00 00
v2.y : $3f ef ff ff ff ff ff ff
NOTE: Byte order is reversed in the watch value results, as compared to the actual values above, due to the Little Endian nature of Intel.
We can then test the hypothesis by passing Doubles with these internal representations into FloatToStr():
var
a: Double;
b: Double;
ai: Int64 absolute a;
bi: Int64 absolute b;
begin
ai := $3ff0000000000000;
bi := $3fefffffffffffff;
s := FloatToStr(a) + ' = ' + FloatToStr(b);
// Yields 's' = '1 = 1';
end;
We can conclude therefore that the evaluation of B is correct. v1.y and v2.y are different. The representation of the Double values by the debugger is incorrect (or at best misleading).
By changing the expression for B to use SameValue() we can determine the deviation between the values involved:
uses
Math;
const
EPSILON = 0.1;
B := SameValue(V1.y, V2.y, EPSILON);
By progressively reducing the value of EPSILON we find that v1.y and v2.y differ by an amount less than 0.000000000000001 since:
EPSILON = 0.000000000000001; // Yields B = TRUE
EPSILON = 0.0000000000000001; // Yields B = FALSE

Your problem stems from the fact that the 2 floating point values are not 100% equal and that the Debug Inspector rounds the floating point, to see the real value you need add a watch and specify floating point as visualizer:
Using the memory dump visualizer also reveals the difference between the 2 values:

Delphi XE2 Rounding with DecimalRounding_JH1

Because of a documented rounding issue in Delphi XE2, we are using a special rounding unit available on the Embarcadero site named DecimalRounding_JH1 to achieve true bankers rounding. A link to the unit can be found here:
DecimalRounding_JH1
Using this unit's DecimalRound function with numbers containing a large number of decimal place we
This is the rounding routine from the DecimalRounding_JH1 unit. In our example we call this DecimalRound function with the following parameters (166426800, 12, MaxRelErrDbl, drHalfEven) where maxRelErrDbl = 2.2204460493e-16 * 1.234375 * 2
Function DecimalRound(Value: extended; NDFD: integer; MaxRelErr: double;
Ctrl: tDecimalRoundingCtrl = drHalfEven): extended;
{ The DecimalRounding function is for doing the best possible job of rounding
floating binary point numbers to the specified (NDFD) number of decimal
fraction digits. MaxRelErr is the maximum relative error that will allowed
when determining when to apply the rounding rule. }
var i64, j64: Int64; k: integer; m, ScaledVal, ScaledErr: extended;
begin
If IsNaN(Value) or (Ctrl = drNone)
then begin Result := Value; EXIT end;
Assert(MaxRelErr > 0,
'MaxRelErr param in call to DecimalRound() must be greater than zero.');
{ Compute 10^NDFD and scale the Value and MaxError: }
m := 1; For k := 1 to abs(NDFD) do m := m*10;
If NDFD >= 0
then begin
ScaledVal := Value * m;
ScaledErr := abs(MaxRelErr*Value) * m;
end
else begin
ScaledVal := Value / m;
ScaledErr := abs(MaxRelErr*Value) / m;
end;
{ Do the diferent basic types separately: }
Case Ctrl of
drHalfEven: begin
**i64 := round((ScaledVal - ScaledErr));**
The last line is where we get a floating point error.
Any thoughts on why this error is occurring?

If you get an exception, that means you cannot represent your value as an double within specified error range.
In other words, the maxRelErrDbl is too small.
Try with maxRelErrDbl = 0,0000000001 or something to test if I am right.

How to compare double in delphi?

We are facing issue with data type double comparison:
if(p > pmax) then
begin
Showmessage('');
end
If both values are 100 (p=100 and pmax = 100), then also it is going inside if clause.

The Math.pas unit includes functions such as SameValue(), IsZero(), CompareValue() which handle floating type comparisons and equality.
const
EPSILON = 0.0000001;
begin
if CompareValue(p, pMax, EPSILON) = GreaterThanValue then
ShowMessage('p greater than pMax');
The constant GreaterThanValue is defined in Types.pas
If you're comparing very large values you shouldn't use a constant for epsilon, instead your epsilon value should be calculated based on the values you're comparing.
var
epsilon: double;
begin
epsilon := Max(Min(Abs(p), Abs(pMax)) * 0.000001, 0.000001);
if CompareValue(p, pMax, epsilon) = GreaterThanValue then
ShowMessage('p greater than pMax');
Note that if you use CompareValue(a, b, 0) or in XE2 and later CompareValue(a, b), Delphi will automatically fill in a good epsilon for you.
From the Delphi Math unit:
function SameValue(const A, B: Extended; Epsilon: Extended): Boolean;
begin
if Epsilon = 0 then
Epsilon := Max(Min(Abs(A), Abs(B)) * ExtendedResolution, ExtendedResolution);
if A > B then
Result := (A - B) <= Epsilon
else
Result := (B - A) <= Epsilon;
end;
As of Delphi XE2 there are now overloads for all these functions that do not require an epsilon parameter and instead calculate one for you (similar to passing a 0 value for epsilon). For code clarity I would recommend calling these simpler functions and let Delphi handle the epsilon.
The only reason not to use the overloads without epsilon parameters would be when performance is crucial and you want to avoid the overhead of having the epsilon repeatedly calculated.

There are several problems with comparing Doubles. One problem is that what you see is not exactly what you get due to rounding. You can have 99.999999996423 and 100.00000000001632, which are both rounded to 100, but they are not equal.
The solution is to use a margin so that, if the difference of the two Doubles lies within the margin, you accept them as equal.
You can create an IsEqual function using the margin as an optional parameter:
function IsEqual(const ANumber1, ANumber2: Double; const AMargin: Double = cMargin): Boolean;
begin
Result := Abs(ANumber1-ANumber2) <= AMargin;
end;

Converting float or negative integer to hexadecimal in Borland Delphi

Ive written a program that communicate with some hardware using a serial connection.
It sends a lot of hexadecimal values my way (sensor readings) and every once in a while it sends a negative value.
ex.
i receive a hexadecimal value : FFFFF5D6
and i have to convert it into : -2602
another problem i have is that i can't convert a float into hex and back.
Are there any simple ways of doing this?

You can "convert" from hex to float by using an integer large enough to cover the float value used, then using the ABSOLUTE keyword. All that is really doing is encoding the memory of the value as an integer. Be very careful to use types which are exactly the same size (you can use SIZEOF to find the memory size of a value). If you need an odd size, then absolute against an array of byte and loop through and convert to/from each byte (which would be two characters hex).
the ABSOLUTE keyword forces two variables to START at the same memory address, any value written from one is immediately available in the other.
var
fDecimal : Double; // size = 8 bytes
fInteger : Int64 absolute fDecimal; // size = 8 bytes
begin
fDecimal := 3.14;
ShowMessage(format('%x=%f',[fInteger,fDecimal]));
fInteger := StrToInt64('$1234123412341234');
ShowMessage(FloatToStr(fDecimal)+'='+Format('%x',[fInteger]));
end;
here is the routine for floats with odd sizes:
var
fDecimal : extended;
fInteger : array[1..10] of byte absolute fDecimal;
sHex : string;
iX : integer;
begin
ShowMessage(IntToStr(SizeOf(fDecimal))+':'+IntToStr(SizeOf(fInteger)));
fDecimal := 3.14;
sHex := '';
for iX := 1 to 10 do
sHex := sHex + IntToHex(fInteger[iX],2);
ShowMessage(sHex);
// clear the value
fDecimal := 0.0;
// Reload the value
for iX := 1 to (Length(sHex) DIV 2) do
fInteger[iX] := StrToInt('$'+Copy(sHex,(Ix*2)-1,2));
ShowMessage(FloatToStr(fDecimal));
end;

to convert a hex string into a integer, you can use the StrToInt Function , also you can check the TryStrToInt function (wich returns False if string does not represent a valid number).
uses
SysUtils;
var
ivalue : integer;
begin
ivalue:=StrToInt('$FFFFF5D6'); // Hexadecimal values start with a '$' in Delphi
..
end;
For the Hexadecimal representation of a float number, you can check theses articles.
http://www.merlyn.demon.co.uk/pas-type.htm#Str
http://www.merlyn.demon.co.uk/pas-real.htm
http://www.merlyn.demon.co.uk/programs/hexfloat.pas (source code example)

I've never seen a float represented in hex, so you'd have to figure out exactly what format the device is using for that. For the negative number case, you'll need to use HexToInt to covert it to an integer, and then determine if that integer is larger than whatever threshold value represents MaxInt for the hardware's integer data type. If it's bigger than that, it's a negative number, which means you'll need to, IIRC, get the number's 1s complement and convert it to negative.

If you want to separate the exponent and the significand, you can use a variant record:
TExtRec = packed record
case Boolean of
false:
(AValue: Extended);
true:
(ASignificand: uint64; ASignExp: uint16;)
end;
I think this helps to understand the structure of the floating point number.
Example usage:
var
r: TExtRec;
begin
r.AValue := 123.567;
ShowMessage(IntToHex(r.ASignExp) + IntToHex(r.ASignificand));
end;
Output:
4005F7224DD2F1A9FBE7
You can calculate it back:
v = (-1)s * 2(e-16383) * (i.f)
With
e = $4005 = 16389 and
i.f = $F7224DD2F1A9FBE7
i.f = 1.930734374999999999954029827886614611998084001243114471435546875
v=123.566999999999999997057908984743335167877376079559326171875
To convert i.f, i've used a binary converter.

Trunc() function

look the follow code, why the result of Trunc function is different?
procedure TForm1.Button1Click(Sender: TObject);
var
D: Double;
E: Extended;
I: Int64;
begin
D := Frac(101 / 100) * 100;
E := Frac(101 / 100) * 100;
I := Trunc(D);
ShowMessage('Trunc(Double): ' + IntToStr(I)); // Trunc(Double): 1
I := Trunc(E);
ShowMessage('Trunc(Extended): ' + IntToStr(I)); // Trunc(Extended): 0
end;

Formatting functions don't always display the actual numbers (data).
Real numbers and precision can be tricky.
Check out this code where I use more precision on what I want to see on the screen:
D := Frac(101 / 100);
E := Frac(101 / 100);
ShowMessage(FloatToStrF(D, ffFixed, 15, 20));
ShowMessage(FloatToStrF(E, ffFixed, 18, 20));
It appears that D is something like 0.010000000000 while E is like 0.00999999999.
Edit: Extended type has better precision than Double type.
If we try to display the values of D and E with FloatToString() we'll probably get the same result, even though the actual values are not the same.

Note Nick D’s answer. He is right when saying that
It appears that D is something like
0.010000000000 while E is like 0.00999999999.
The answer however, is not in formatting function. This is how the float calculations are done. Computers simply do not understand float numbers (since there is infinite amount of numbers between 0 and 1, while computers operate on finite number of bits and bytes), and every Double or Extended variable in Delphi (and most other languages) is just an approximation (with some really rare exceptions).
You can read more of it on Wikipedia: Floating point and Fixed-point

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Delphi wrong double precision calculation - delphi

Related

To what accuracy are the local variables displayed in the Embarcadero RAD Studio XE2 debugger? Apparently 1 is not equal to 1

Delphi XE2 Rounding with DecimalRounding_JH1

How to compare double in delphi?

Converting float or negative integer to hexadecimal in Borland Delphi

Trunc() function

Categories

Resources