look the follow code, why the result of Trunc function is different?
procedure TForm1.Button1Click(Sender: TObject);
var
D: Double;
E: Extended;
I: Int64;
begin
D := Frac(101 / 100) * 100;
E := Frac(101 / 100) * 100;
I := Trunc(D);
ShowMessage('Trunc(Double): ' + IntToStr(I)); // Trunc(Double): 1
I := Trunc(E);
ShowMessage('Trunc(Extended): ' + IntToStr(I)); // Trunc(Extended): 0
end;
Formatting functions don't always display the actual numbers (data).
Real numbers and precision can be tricky.
Check out this code where I use more precision on what I want to see on the screen:
D := Frac(101 / 100);
E := Frac(101 / 100);
ShowMessage(FloatToStrF(D, ffFixed, 15, 20));
ShowMessage(FloatToStrF(E, ffFixed, 18, 20));
It appears that D is something like 0.010000000000 while E is like 0.00999999999.
Edit: Extended type has better precision than Double type.
If we try to display the values of D and E with FloatToString() we'll probably get the same result, even though the actual values are not the same.
Note Nick D’s answer. He is right when saying that
It appears that D is something like
0.010000000000 while E is like 0.00999999999.
The answer however, is not in formatting function. This is how the float calculations are done. Computers simply do not understand float numbers (since there is infinite amount of numbers between 0 and 1, while computers operate on finite number of bits and bytes), and every Double or Extended variable in Delphi (and most other languages) is just an approximation (with some really rare exceptions).
You can read more of it on Wikipedia: Floating point and Fixed-point
Related
Take the following record:
TVector2D = record
public
class operator Equal(const V1, V2: TVector2D): Boolean;
class operator Multiply(const D: Accuracy; const V: TVector2D): TVector2D;
class operator Divide(const V: TVector2D; const D: Accuracy): TVector2D;
class function New(const x, y: Accuracy): TVector2D; static;
function Magnitude: Accuracy;
function Normalised: TVector2D;
public
x, y: Accuracy;
end;
With the methods defined as:
class operator TVector2D.Equal(const V1, V2: TVector2D): Boolean;
var
A, B: Boolean;
begin
Result := (V1.x = V2.x) and (V1.y = V2.y);
end;
class operator TVector2D.Multiply(const D: Accuracy; const V: TVector2D): TVector2D;
begin
Result.x := D*V.x;
Result.y := D*V.y;
end;
class operator TVector2D.Divide(const V: TVector2D; const D: Accuracy): TVector2D;
begin
Result := (1.0/D)*V;
end;
class function TVector2D.New(const x, y: Accuracy): TVector2D;
begin
Result.x := x;
Result.y := y;
end;
function TVector2D.Magnitude;
begin
RESULT := Sqrt(x*x + y*y);
end;
function TVector2D.Normalised: TVector2D;
begin
Result := Self/Magnitude;
end;
and a constant:
const
jHat2D : TVector2D = (x: 0; y: 1);
I would expect the Boolean value of (jHat2D = TVector2D.New(0,0.707).Normalised) to be True. Yet it comes out as False.
In the debugger TVector2D.New(0,0.707).Normalised.y shows as exactly 1.
It cannot be the case that this is exactly 1, otherwise the Boolean value of (jHat2D = TVector2D.New(0,0.707).Normalised) would be True.
Any ideas?
Edit
Accuracy is a Type defined as: Accuracy = Double
Assuming that Accuracy is a synonym for a Double type, this is a bug in the visualization of floating point values by the debugger. Due to the inherent problems with internal representation of floating points, v1.Y and v2.Y have very slightly different values, though both approximate to 1.
Add watches for v1.y and v2.y. Ensure that these watch values are configured to represent as "Floating Point" values with Digits set to 18 for maximum detail.
At your breakpoint you will see that:
v1.y = 1
v2.y = 0.999999999999999889
(whosrdaddy provided the above short version in the comments on the question, but I am retaining the long form of my investigation - see below the line after Conclusion - as it may prove useful in other, similar circumstances as well as being of potential interest)
Conclusion
Whilst the debugger visualizations are strictly speaking incorrect (or at best misleading), they are never-the-less very almost correct. :)
The question then is whether you require strict accuracy or accuracy to within a certain tolerance. If the latter then you can adopt the use of SameValue() with an EPSILON defined suitable to the degree of accuracy you require.
Otherwise you must accept that when debugging your code you cannot rely on the debugger to represent the values involved in your debugging to the degree of accuracy relied on in the code itself.
Option: Customise the Debug Visualization Itself
Alternatively you may wish to investigate creating a custom debug visualisation for your TVector2D type to represent your x/y values to the accuracy employed in your code.
For such a visualization using FloatToStr(), use Format() with a %f format specifier with a suitable number of decimal places. e.g. the below call yields the result obtained by watching the variable as described above:
Format('%.18f', [v2.y]);
// Yields 0.999999999999999889
Long Version of Original Investigation
I modified the Equal operator to allow me to inspect the internal representation of the two values v1.y and v2.y:
type
PAccuracy = Accuracy;
class operator TVector2D.Equal(const V1, V2: TVector2D): Boolean;
var
A, B: Boolean;
ay, by: PAccuracy;
begin
ay := #V1.y;
by := #V2.y;
A := (V1.x = V2.x);
B := (V1.y = V2.y);
result := A and B;
end;
By setting watches in the debugger to provide a Memory Dump of ay^ and by^ we see that the two values are represented internally very differently:
v1.y : $3f f0 00 00 00 00 00 00
v2.y : $3f ef ff ff ff ff ff ff
NOTE: Byte order is reversed in the watch value results, as compared to the actual values above, due to the Little Endian nature of Intel.
We can then test the hypothesis by passing Doubles with these internal representations into FloatToStr():
var
a: Double;
b: Double;
ai: Int64 absolute a;
bi: Int64 absolute b;
begin
ai := $3ff0000000000000;
bi := $3fefffffffffffff;
s := FloatToStr(a) + ' = ' + FloatToStr(b);
// Yields 's' = '1 = 1';
end;
We can conclude therefore that the evaluation of B is correct. v1.y and v2.y are different. The representation of the Double values by the debugger is incorrect (or at best misleading).
By changing the expression for B to use SameValue() we can determine the deviation between the values involved:
uses
Math;
const
EPSILON = 0.1;
B := SameValue(V1.y, V2.y, EPSILON);
By progressively reducing the value of EPSILON we find that v1.y and v2.y differ by an amount less than 0.000000000000001 since:
EPSILON = 0.000000000000001; // Yields B = TRUE
EPSILON = 0.0000000000000001; // Yields B = FALSE
Your problem stems from the fact that the 2 floating point values are not 100% equal and that the Debug Inspector rounds the floating point, to see the real value you need add a watch and specify floating point as visualizer:
Using the memory dump visualizer also reveals the difference between the 2 values:
Because of a documented rounding issue in Delphi XE2, we are using a special rounding unit available on the Embarcadero site named DecimalRounding_JH1 to achieve true bankers rounding. A link to the unit can be found here:
DecimalRounding_JH1
Using this unit's DecimalRound function with numbers containing a large number of decimal place we
This is the rounding routine from the DecimalRounding_JH1 unit. In our example we call this DecimalRound function with the following parameters (166426800, 12, MaxRelErrDbl, drHalfEven) where maxRelErrDbl = 2.2204460493e-16 * 1.234375 * 2
Function DecimalRound(Value: extended; NDFD: integer; MaxRelErr: double;
Ctrl: tDecimalRoundingCtrl = drHalfEven): extended;
{ The DecimalRounding function is for doing the best possible job of rounding
floating binary point numbers to the specified (NDFD) number of decimal
fraction digits. MaxRelErr is the maximum relative error that will allowed
when determining when to apply the rounding rule. }
var i64, j64: Int64; k: integer; m, ScaledVal, ScaledErr: extended;
begin
If IsNaN(Value) or (Ctrl = drNone)
then begin Result := Value; EXIT end;
Assert(MaxRelErr > 0,
'MaxRelErr param in call to DecimalRound() must be greater than zero.');
{ Compute 10^NDFD and scale the Value and MaxError: }
m := 1; For k := 1 to abs(NDFD) do m := m*10;
If NDFD >= 0
then begin
ScaledVal := Value * m;
ScaledErr := abs(MaxRelErr*Value) * m;
end
else begin
ScaledVal := Value / m;
ScaledErr := abs(MaxRelErr*Value) / m;
end;
{ Do the diferent basic types separately: }
Case Ctrl of
drHalfEven: begin
**i64 := round((ScaledVal - ScaledErr));**
The last line is where we get a floating point error.
Any thoughts on why this error is occurring?
If you get an exception, that means you cannot represent your value as an double within specified error range.
In other words, the maxRelErrDbl is too small.
Try with maxRelErrDbl = 0,0000000001 or something to test if I am right.
In the following code, the last two calls to Ceil give unexpected result. Could you help to comment on the reason?
Furthermore, if the error (or deviation) is random, could I get the expected value?
Ceil(Calculated_Var_Value) = 7 when Calculated_Var_Value = 7.0000000000.
Many thanks!
procedure TForm2.FormCreate(Sender: TObject);
var
A, B, C: Extended;
Val: Extended;
begin
ShowMessage(FloatToStr((1.8 - 2.5) / -0.1));
ShowMessage(FloatToStrF((1.8 - 2.5) / -0.1, ffFixed, 20, 20));
ShowMessage(FloatToStr(Ceil((1.8 - 2.5) / -0.1)));
Val := (1.8 - 2.5) / -0.1;
ShowMessage(FloatToStr(Val));
ShowMessage(FloatToStrF(Val, ffFixed, 20, 20));
ShowMessage(FloatToStr(Ceil(Val)));
Val := (1.8 - 2.5) / -0.1;
ShowMessage(FloatToStr(Val * 100 / 100));
ShowMessage(FloatToStrF(Val * 100 / 100, ffFixed, 20, 20));
ShowMessage(FloatToStr(Ceil(Val * 100 / 100)));
A := 1.8; B := 2.5; C := -0.1;
Val := (A - B) / C;
ShowMessage(FloatToStr(Val));
ShowMessage(FloatToStrF(Val, ffFixed, 20, 20));
ShowMessage(FloatToStr(Ceil(Val)));
A := 1.8; B := 2.5; C := -0.1;
Val := (A - B) / C;
ShowMessage(FloatToStr(Val * 100 / 100));
ShowMessage(FloatToStrF(Val * 100 / 100, ffFixed, 20, 20));
ShowMessage(FloatToStr(Ceil(Val * 100 / 100)));
end;
This is just down to the inherent inaccuracy of floating point arithmetic. Two of your values are not exactly representable in binary floating point, 1.8 and -0.1. So, those numbers are approximated by the closest representable values. And that means that it's quite plausible that your equation won't evaluate to exactly 7.
Now consider your two expressions:
Val1 := (1.8 - 2.5) / -0.1;
Val2 := (A - B) / C;
The difference between these two is that Val1 is evaluated at compile time and Val2 is evaluated at runtime. Now, it's down to the compiler how the constant expression (1.8 - 2.5) / -0.1 is evaluated. To the best of my knowledge, it's not documented how that will be evaluated.
However, it is clear that the compiler uses a different evaluation method to evaluate the constant expression from that used at runtime. This program illustrates that:
{$APPTYPE CONSOLE}
uses
SysUtils, Math;
var
A, B, C: Extended;
Val1, Val2: Extended;
begin
Val1 := (1.8 - 2.5) / -0.1;
Writeln(Ceil(Val1));
A := 1.8; B := 2.5; C := -0.1;
Val2 := (A - B) / C;
Writeln(Ceil(Val2));
Writeln(BoolToStr(Val1=7.0, True));
Writeln(BoolToStr(Val2=7.0, True));
Writeln(BoolToStr(Val1<Val2, True));
Readln;
end.
Output:
7
8
True
False
True
So, this shows that Val1 and Val2 have different values, and that Val2 is strictly greater than 7.
The fundamental problem you have is that of representably of floating point values. Because you are using Extended, which uses a binary representation, your decimal input values are not exactly representable. If you want exact arithmetic here, you will need to use a decimal representation.
As always when answering variants of this question, I refer you to the essential reading on the subject: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This is a typical rounding error. You can see it when you add this line to your output:
ShowMessage(FloatToStr(Val-7));
We are facing issue with data type double comparison:
if(p > pmax) then
begin
Showmessage('');
end
If both values are 100 (p=100 and pmax = 100), then also it is going inside if clause.
The Math.pas unit includes functions such as SameValue(), IsZero(), CompareValue() which handle floating type comparisons and equality.
const
EPSILON = 0.0000001;
begin
if CompareValue(p, pMax, EPSILON) = GreaterThanValue then
ShowMessage('p greater than pMax');
The constant GreaterThanValue is defined in Types.pas
If you're comparing very large values you shouldn't use a constant for epsilon, instead your epsilon value should be calculated based on the values you're comparing.
var
epsilon: double;
begin
epsilon := Max(Min(Abs(p), Abs(pMax)) * 0.000001, 0.000001);
if CompareValue(p, pMax, epsilon) = GreaterThanValue then
ShowMessage('p greater than pMax');
Note that if you use CompareValue(a, b, 0) or in XE2 and later CompareValue(a, b), Delphi will automatically fill in a good epsilon for you.
From the Delphi Math unit:
function SameValue(const A, B: Extended; Epsilon: Extended): Boolean;
begin
if Epsilon = 0 then
Epsilon := Max(Min(Abs(A), Abs(B)) * ExtendedResolution, ExtendedResolution);
if A > B then
Result := (A - B) <= Epsilon
else
Result := (B - A) <= Epsilon;
end;
As of Delphi XE2 there are now overloads for all these functions that do not require an epsilon parameter and instead calculate one for you (similar to passing a 0 value for epsilon). For code clarity I would recommend calling these simpler functions and let Delphi handle the epsilon.
The only reason not to use the overloads without epsilon parameters would be when performance is crucial and you want to avoid the overhead of having the epsilon repeatedly calculated.
There are several problems with comparing Doubles. One problem is that what you see is not exactly what you get due to rounding. You can have 99.999999996423 and 100.00000000001632, which are both rounded to 100, but they are not equal.
The solution is to use a margin so that, if the difference of the two Doubles lies within the margin, you accept them as equal.
You can create an IsEqual function using the margin as an optional parameter:
function IsEqual(const ANumber1, ANumber2: Double; const AMargin: Double = cMargin): Boolean;
begin
Result := Abs(ANumber1-ANumber2) <= AMargin;
end;
I have selected columns from a database table and want this data with two decimal places only. I have:
SQL.Strings = ('select '#9'my_index '#9'his_index,'...
What is that #9?
How can I deal with the data I selected to make it only keep two decimal places?
I am very new to Delphi.
#9 is the character with code 9, TAB.
If you want to convert a floating point value to a string with 2 decimal places you use one of the formatting functions, e.g. Format():
var
d: Double;
s: string;
...
d := Sqrt(2.0);
s := Format('%.2f', [d]);
function Round2(aValue:double):double;
begin
Round2:=Round(aValue*100)/100;
end;
#9 is the tab character.
If f is a floating-point variable, you can do FormatFloat('#.##', f) to obtain a string representation of f with no more than 2 decimals.
For N Places behind the seperator use
function round_n(f:double; n:nativeint):double;
var i,m : nativeint;
begin
m := 10;
for i := 1 to pred(n) do
m := m * 10;
f := f * m;
f := round(f);
result := f / m;
end;
For Float to Float (with 2 decimal places, say) rounding check this from documentation. Gives sufficient examples too. It uses banker's rounding.
x := RoundTo(1.235, -2); //gives 1.24
Note that there is a difference between simply truncating to two decimal places (like in Format()), rounding to integer, and rounding to float.
Nowadays the SysUtils unit contains the solution:
System.SysUtils.FloatToStrF( singleValue, 7, ffFixed, 2 );
System.SysUtils.FloatToStrF( doubleValue, 15, ffFixed, 2 );
You can pass +1 TFormatSettings parameter if the requiered decimal/thousand separator differ from the current system locale settings.
The internal float format routines only work with simple numbers > 1
You need to do something more complicated for a general purpose decimal place limiter that works correctly on both fixed point and values < 1 with scientific notation.
I use this routine
function TForm1.Flt2str(Avalue:double; ADigits:integer):string;
var v:double; p:integer; e:string;
begin
if abs(Avalue)<1 then
begin
result:=floatTostr(Avalue);
p:=pos('E',result);
if p>0 then
begin
e:=copy(result,p,length(result));
setlength(result,p-1);
v:=RoundTo(StrToFloat(result),-Adigits);
result:=FloatToStr(v)+e;
end else
result:=FloatToStr(RoundTo(Avalue,-Adigits));
end
else
result:=FloatToStr(RoundTo(Avalue,-Adigits));
end;
So, with digits=2, 1.2349 rounds to 1.23 and 1.2349E-17 rounds to 1.23E-17
This worked for me :
Function RoundingUserDefineDecaimalPart(FloatNum: Double; NoOfDecPart: integer): Double;
Var
ls_FloatNumber: String;
Begin
ls_FloatNumber := FloatToStr(FloatNum);
IF Pos('.', ls_FloatNumber) > 0 Then
Result := StrToFloat
(copy(ls_FloatNumber, 1, Pos('.', ls_FloatNumber) - 1) + '.' + copy
(ls_FloatNumber, Pos('.', ls_FloatNumber) + 1, NoOfDecPart))
Else
Result := FloatNum;
End;
Function RealFormat(FloatNum: Double): string;
Var
ls_FloatNumber: String;
Begin
ls_FloatNumber:=StringReplace(FloatToStr(FloatNum),',','.',[rfReplaceAll]);
IF Pos('.', ls_FloatNumber) > 0 Then
Result :=
(copy(ls_FloatNumber, 1, Pos('.', ls_FloatNumber) - 1) + '.' + copy
(ls_FloatNumber, Pos('.', ls_FloatNumber) + 1, 2))
Else
Result := FloatToStr(FloatNum);
End;