double rounding - delphi

I this is code that rounds 62 to 61 and shows it in the output. Why it decides to round and how to get 62 in the output?
var d: double;
i: integer;
begin
d:=0.62;
i:= trunc(d*100);
Showmessage( inttostr(i) );
end;

This boils down to the fact that 0.62 is not exactly representable in a binary floating point data type. The closest representable double value to 0.62 is:
0.61999 99999 99999 99555 91079 01499 37383 83054 73327 63671 875
When you multiply this value by 100, the resulting value is slightly less that 62. What happens next depends on how the intermediate value d*100 is treated. In your program, under the 32 bit Windows compiler with default settings, the intermediate value is held in an 80 bit extended register. And the closest 80 bit extended precision value is:
61.99999 99999 99999 55591 07901 49937 38383 05473 32763 67187 5
Since the value is less than 62, Trunc returns 61 since Trunc rounds towards zero.
If you stored d*100 in a double value, then you'd see a different result.
d := 0.62;
d := d*100;
i := Trunc(d);
Writeln(i);
This program outputs 62 rather than 61. That's because although d*100 to extended 80 bit precision is less than 62, the closest double precision value to that 80 bit value is in fact 62.
Similarly, if you compile your original program with the 64 bit compiler, then arithmetic is performed in the SSE unit which has no 80 bit registers. And so there is no 80 bit intermediate value and your program outputs 62.
Or, going back to the 32 bit compiler, you can arrange that intermediate values are stored to 64 bit precision on the FPU and also achieve an output of 62. Call Set8087CW($1232) to achieve that.
As you can see, binary floating point arithmetic can sometimes be surprising.
If you use Round rather than Trunc then the value returned will be the closest integer, rather than rounding towards zero as Trunc does.
But perhaps a better solution would be to use a decimal data type rather than a binary data type. If you do that then you can represent 0.62 exactly and thereby avoid all such problems. Delphi's built in decimal real valued data type is Currency.

Use round instead of trunc.
round will round towards the nearest integer, and 62.00 is very close to 62, so there is no problem. trunc will round to the nearest integer towards zero, and 62.00 is very close to 61.9999999 so numeric 'fuzz' might very well cause the issue you describe.

Related

How to convert a floating point number to a string with max. 2 decimal digits in Delphi

How can I convert a floating point number to a string with a maximum of 2 decimal digits in Delphi7?
I've tried using:
FloatToStrF(Query.FieldByName('Quantity').AsFloat, ffGeneral, 18, 2, FS);
But with the above, sometimes more than 2 decimal digits are given back, ie. the result is: 15,60000009
Use ffFixed instead of ffGeneral.
ffGeneral ignores the Decimal parameter.
When you use ffGeneral, the 18 is saying that you want 18 significant decimal digits. The routine will then express that number in the shortest manner, using scientific notation if necessary. The 2 is ignored.
When you use ffFixed, you are saying you want 2 digits after the decimal point.
If you are wondering about why you sometimes get values that seem to be imprecise, there is much to be found on this site and others that will explain how floating-point numbers work.
In this case, AsFloat is returning a double, which like (most) other floating-point formats, stores its value in binary. In the same way that 1/3 cannot be written in decimal with finite digits, neither can 15.6 be represented in binary in a finite number of bits. The system chooses the closest possible value that can be stored in a double. The exact value, in decimal, is:
15.5999999999999996447286321199499070644378662109375
If you had asked for 16 digits of precision, the value would've been rounded off to 15.6. But you asked for 18 digits, so you get 15.5999999999999996.
If you really mean what you write (MAX 2 decimal digits) and does not mean ALWAYS 2 decimal digits, then the two code snippets in the comments won't give you want you asked for (they will return a string that ALWAYS has two decimal digits, ie. ONE is returned as "1.00" (or "1,00" for Format depending on your decimal point).
If you truly want an option with MAX 2 decimal digits, you'll have to do a little post-processing of the returned string.
FUNCTION FloatToStrMaxDecimals(F : Extended ; MaxDecimals : BYTE) : STRING;
BEGIN
Result:=Format('%.'+IntToStr(MaxDecimals)+'f',[F]);
WHILE Result[LENGTH(Result)]='0' DO DELETE(Result,LENGTH(Result),1);
IF Result[LENGTH(Result)] IN ['.',','] THEN DELETE(Result,LENGTH(Result),1)
END;
An alternative (and probably faster) implementation could be:
FUNCTION FloatToStrMaxDecimals(F : Extended ; MaxDecimals : BYTE) : STRING;
BEGIN
Result:=Format('%.'+IntToStr(MaxDecimals)+'f',[F]);
WHILE Result[LENGTH(Result)]='0' DO SetLength(Result,PRED(LENGTH(Result)));
IF Result[LENGTH(Result)] IN ['.',','] THEN SetLength(Result,PRED(LENGTH(Result)))
END;
This function will return a floating point number with MAX the number of specified decimal digits, ie. one half with MAX 2 digits will return "0.5" and one third with MAX 2 decimal digits will return "0.33" and two thirds with MAX 2 decimal digits will return "0.67". TEN with MAX 2 decimal digits will return "10".
The final IF statement should really test for the proper decimal point, but I don't think any value other than period or comma is possible, and if one of these are left as the last character in the string after having stripped all zeroes from the end, then it MUST be a decimal point.
Also note, that this code assumes that strings are indexed with 1 for the first character, as it always is in Delphi 7. If you need this code for the mobile compilers in newer Delphi versions, you'll need to update the code. I'll leave that exercise up to the reader :-).
i use this function in my application:
function sclCurrencyND(Const F: Currency; GlobalDegit: word = 2): Currency;
var R: Real; Fact: Currency;
begin
Fact:= power(10, GlobalDegit);
Result:= int(F*Fact)/Fact;
end;

StrToFloat and who is wrong: Delphi or ASE/SQL Server

Recently I found strange thing: result of
var
d: double;
begin
d := StrToFloat('-1.79E308');
is not the same as string value '-1.79E308' converted to float field type by ASE and SQL Server through
INSERT INTO my_table (my_float_field) VALUES (-1.79E308)
For Delphi memory dump is 9A BB AD 58 F1 DC EF FF
For ASE/SQL Server value in packet on select is 99 BB AD 58 F1 DC EF FF.
Who is wrong, both servers or Delphi?
The premise that we are working from is that StrToFloat yields the closest representable binary floating point value to the supplied decimal value.
The two hexadecimal values the you present are adjacent. You can see that they differ by 1 in the significand. Here is some Python code that decodes the two values:
>>> import struct
>>> struct.unpack('!d', 'ffefdcf158adbb9a'.decode('hex'))[0]
-1.7900000000000002e+308
>>> struct.unpack('!d', 'ffefdcf158adbb99'.decode('hex'))[0]
-1.79e+308
Bear in mind that Python prints floating point values using the shortest possible significant for which the closest representable value is the actual value. That ffefdcf158adbb99 decodes to a value the prints as -1.79e+308 in the eyes of Python, is sufficient proof that ffefdcf158adbb99 is the closest representable value. In other words, the Delphi code is giving the wrong answer.
And, just out of curiosity, in the opposite direction:
>>> hex(struct.unpack('<Q', struct.pack('<d', float('-1.79e308')))[0])
'0xffefdcf158adbb99L'
It is interesting to note that the 32 bit Delphi compiler yields ffefdcf158adbb99 but the 64 bit Delphi compiler yields ffefdcf158adbb9a. This is a clear defect, and should be submitted as a bug report to Quality Portal.

Unexpected result subtracting decimals in ruby [duplicate]

Can somebody explain why multiplying by 100 here gives a less accurate result but multiplying by 10 twice gives a more accurate result?
± % sc
Loading development environment (Rails 3.0.1)
>> 129.95 * 100
12994.999999999998
>> 129.95*10
1299.5
>> 129.95*10*10
12995.0
If you do the calculations by hand in double-precision binary, which is limited to 53 significant bits, you'll see what's going on:
129.95 = 1.0000001111100110011001100110011001100110011001100110 x 2^7
129.95*100 = 1.1001011000010111111111111111111111111111111111111111011 x 2^13
This is 56 significant bits long, so rounded to 53 bits it's
1.1001011000010111111111111111111111111111111111111111 x 2^13, which equals
12994.999999999998181010596454143524169921875
Now 129.95*10 = 1.01000100110111111111111111111111111111111111111111111 x 2^10
This is 54 significant bits long, so rounded to 53 bits it's 1.01000100111 x 2^10 = 1299.5
Now 1299.5 * 10 = 1.1001011000011 x 2^13 = 12995.
First off: you are looking at the string representation of the result, not the actual result itself. If you really want to compare the two results, you should format both results explicitly, using String#% and you should format both results the same way.
Secondly, that's just how binary floating point numbers work. They are inexact, they are finite and they are binary. All three mean that you get rounding errors, which generally look totally random, unless you happen to have memorized the entirety of IEEE754 and can recite it backwards in your sleep.
There is no floating point number exactly equal to 129.95. So your language uses a value which is close to it instead. When that value is multiplied by 100, the result is close to 12995, but it just so happens to not equal 12995. (It is also not exactly equal to 100 times the original value it used in place of 129.95.) So your interpreter prints a decimal number which is close to (but not equal to) the value of 129.95 * 100 and which shows you that it is not exactly 12995. It also just so happens that the result 129.95 * 10 is exactly equal to 1299.5. This is mostly luck.
Bottom line is, never expect equality out of any floating point arithmetic, only "closeness".

Why delphi's RoundTo method behaves differently? [duplicate]

I expected that the result would be 87.29. I also tried SimpleRoundTo, but produces the same result.
In the help there is also a "strange" example:
ms-help://embarcadero.rs2010/vcl/Math.RoundTo.html
RoundTo(1.235, -2) => 1.24
RoundTo(1.245, -2) => 1.24 //???
Does anybody know which function I need to get the result of 87.29? I mean: If the last digit >= 5 round up, if < 5 round down. As taught in the school :)
I use Delphi2010, and SetRoundMode(rmNearest). I also tried with rmTruncate.
The value 87.285 is stored in a double variable.
Also strange:
SimpleRoundTo(87.285, -2) => 87.29
but
x := 87.285; //double
SimpleRoundTo(x, -2) => 87.28
The exact value 87.285 is not representable as a floating-point value in Delphi. A page on my Web site shows what that value really is, as Extended, Double, and Single:
87.285 = + 87.28500 00000 00000 00333 06690 73875 46962 12708 95004 27246 09375
87.285 = + 87.28499 99999 99996 58939 48683 51519 10781 86035 15625
87.285 = + 87.28500 36621 09375
By default, floating-point literals in Delphi have type Extended, and as you can see, the Extended version of your number is slightly higher than 87.285, so it is correct that rounding to nearest would round up. But as a Double, the real number is slightly lower. That's why you get the number you expected if you explicitly store the number in a Double variable before calling RoundTo. There are overloads of that function for each of Delphi's floating-point types.
87.285 is not exactly representable and the nearest double is slightly smaller.
The classic reference on floating point is What Every Computer Scientist Should Know About Floating-
Point Arithmetic.
For currency based calculations, if indeed this is, you should use a base 10 number type rather than base 2 floating point. In Delphi that means Currency.

SetRoundMode(rmUp) and rounding "round" values like 10, results in 10,0001 how come?

This code:
SetRoundMode(rmUp);
Memo1.Lines.Add(CurrToStr(SimpleRoundTo(10)));
Results in 10,0001.
I simply don't get it.
I thought that rmUp would do something like, round 10,0001 to 11, but never 10 to 10,0001.
Can anyone explain why this happens?
Thanks.
SimpleRoundTo works like this:
Divide the input value by 10-x, where x is the number of decimal places to preserve in the result.
Add 0.5 to that product.
Truncate the sum.
Multiply by 10-x.
The result is a floating-point value. As with most floating-point values, the result will not be exact, even though in your case you start with an exact value. The number of decimal places specified for SimpleRoundTo is negative, so the divisor in step 1, for your example input, will ideally be 0.01. But that can't be represented exactly as a floating-point number, so when 10 / 0.01 is calculated in step 1, the result is not exactly 1000. The result in step 3 will be exactly 1000, though, so the inexactness of the division isn't important. The inexactness of the multiplication in step 4 is, though. That product won't be exact. It will be slightly higher than 10.
So SimpleRoundTo returns a slightly higher value, and since you've specified that rounding should go up, the conversion of the Extended result of SimpleRoundTo to the Currency input of CurrToStr results in exactly 10.0001.
Currency values are exact; they represent a fixed-point value, an integer scaled by four decimal places.
i'd use the Round( ) function if banker's rounding is ok. it returns an integer.
if you don't like banker's rounding you can use this:
// use this to not get "banker's rounding"
function HumanRound(X: Extended): integer;
// Rounds a number "normally": if the fractional
// part is >= 0.5 the number is rounded up (see RoundUp)
// Otherwise, if the fractional part is < 0.5, the
// number is rounded down
// RoundN(3.5) = 4 RoundN(-3.5) = -4
// RoundN(3.1) = 3 RoundN(-3.1) = -3
begin
// Trunc() does nothing except conv to integer. needed because return type of Int() is Extended
Result := Trunc(Int(X) + Int(Frac(X) * 2));
end;
my posting here is somewhat off-topic but still informative.
i looked into this at length since i needed to not be using banker's rounding. here are my findings. so far as i can see, this still doesn't get rid of banker's rounding
Value Meaning
rmNearest Rounds to the closest value.
rmDown Rounds toward negative infinity.
rmUp Rounds toward positive infinity.
rmTruncate Truncates the value, rounding positive numbers down and negative numbers up.
rmNearest // default
0.500 0
1.500 2
2.450 2
2.500 2
2.550 3
3.450 3
3.500 4
3.550 4
rmDown
0.500 0
1.500 1
2.450 2
2.500 2
2.550 2
3.450 3
3.500 3
3.550 3
rmUp
0.500 1
1.500 2
2.450 3
2.500 3
2.550 3
3.450 4
3.500 4
3.550 4
rmTrunc
0.500 0
1.500 1
2.450 2
2.500 2
2.550 2
3.450 3
3.500 3
3.550 3
uses
math, sysutils, clipbrd;
var
s:string;
procedure trythis(sMode:string);
procedure tryone(d:double);
begin
s:=s+Format('%1.3f %d%s',[d,Round(d),#13+#10]);
end;
begin
s:=s+#13#10+sMode+#13#10;
tryone(0.50);
tryone(1.50);
tryone(2.45);
tryone(2.50);
tryone(2.55);
tryone(3.45);
tryone(3.50);
tryone(3.55);
end;
begin
s:=inttostr(integer(GetRoundMode));
SetRoundMode(rmNearest);
trythis('nearest');
SetRoundMode(rmDown);
trythis('down');
SetRoundMode(rmUp);
trythis('up');
SetRoundMode(rmTruncate);
trythis('trunc');
clipboard.astext:=s;
end.
The return values from calculation SimpleToRound is also a Double and they can never be trusted on rounding. Truncate the value before converting it should do the work!
Memo1.Lines.Add(CurrToStr(Trunc(SimpleRoundTo(10))));
The Ceil() : Integer function should give you the answer you want for values > 0. If < 0 you may need to use floor() instead, depending on desired behaviour.

Resources