Why delphi's RoundTo method behaves differently? [duplicate] - delphi

I expected that the result would be 87.29. I also tried SimpleRoundTo, but produces the same result.
In the help there is also a "strange" example:
ms-help://embarcadero.rs2010/vcl/Math.RoundTo.html
RoundTo(1.235, -2) => 1.24
RoundTo(1.245, -2) => 1.24 //???
Does anybody know which function I need to get the result of 87.29? I mean: If the last digit >= 5 round up, if < 5 round down. As taught in the school :)
I use Delphi2010, and SetRoundMode(rmNearest). I also tried with rmTruncate.
The value 87.285 is stored in a double variable.
Also strange:
SimpleRoundTo(87.285, -2) => 87.29
but
x := 87.285; //double
SimpleRoundTo(x, -2) => 87.28

The exact value 87.285 is not representable as a floating-point value in Delphi. A page on my Web site shows what that value really is, as Extended, Double, and Single:
87.285 = + 87.28500 00000 00000 00333 06690 73875 46962 12708 95004 27246 09375
87.285 = + 87.28499 99999 99996 58939 48683 51519 10781 86035 15625
87.285 = + 87.28500 36621 09375
By default, floating-point literals in Delphi have type Extended, and as you can see, the Extended version of your number is slightly higher than 87.285, so it is correct that rounding to nearest would round up. But as a Double, the real number is slightly lower. That's why you get the number you expected if you explicitly store the number in a Double variable before calling RoundTo. There are overloads of that function for each of Delphi's floating-point types.

87.285 is not exactly representable and the nearest double is slightly smaller.
The classic reference on floating point is What Every Computer Scientist Should Know About Floating-
Point Arithmetic.
For currency based calculations, if indeed this is, you should use a base 10 number type rather than base 2 floating point. In Delphi that means Currency.

Related

Why multiply two double in dart result in very strange number

Can anyone explain why the result is 252.99999999999997 and not 253? What should be used instead to get 253?
double x = 2.11;
double y = 0.42;
print(((x + y) * 100)); // print 252.99999999999997
I am basically trying to convert a currency value with 2 decimal (ie £2.11) into pence/cent (ie 211p)
Thanks
In short: Because many fractional double values are not precise, and adding imprecise values can give even more imprecise results. That's an inherent property of IEEE-754 floating point numbers, which is what Dart (and most other languages and the CPUs running them) are working with.
Neither of the rational numbers 2.11 and 0.42 are precisely representable as a double value. When you write 2.11 as source code, the meaning of that is the actual double values that is closest to the mathematical number 2.11.
The value of 2.11 is precisely 2.109999999999999875655021241982467472553253173828125.
The value of 0.42 is precisely 0.419999999999999984456877655247808434069156646728515625.
As you can see, both are slightly smaller than the value you intended.
Then you add those two values, which gives the precise double result 2.529999999999999804600747665972448885440826416015625. This loses a few of the last digits of the 0.42 to rounding, and since both were already smaller than 2.11 and 0.42, the result is now even more smaller than 2.53.
Finally you multiply that by 100, which gives the precise result 252.999999999999971578290569595992565155029296875.
This is different from the double value 253.0.
The double.toString method doesn't return a string of the exact value, but it does return different strings for different values, and since the value is different from 253.0, it must return a different string. It then returns a string of the shortest number which is still closer to the result than to the next adjacent double value, and that is the string you see.

Unexpected result subtracting decimals in ruby [duplicate]

Can somebody explain why multiplying by 100 here gives a less accurate result but multiplying by 10 twice gives a more accurate result?
± % sc
Loading development environment (Rails 3.0.1)
>> 129.95 * 100
12994.999999999998
>> 129.95*10
1299.5
>> 129.95*10*10
12995.0
If you do the calculations by hand in double-precision binary, which is limited to 53 significant bits, you'll see what's going on:
129.95 = 1.0000001111100110011001100110011001100110011001100110 x 2^7
129.95*100 = 1.1001011000010111111111111111111111111111111111111111011 x 2^13
This is 56 significant bits long, so rounded to 53 bits it's
1.1001011000010111111111111111111111111111111111111111 x 2^13, which equals
12994.999999999998181010596454143524169921875
Now 129.95*10 = 1.01000100110111111111111111111111111111111111111111111 x 2^10
This is 54 significant bits long, so rounded to 53 bits it's 1.01000100111 x 2^10 = 1299.5
Now 1299.5 * 10 = 1.1001011000011 x 2^13 = 12995.
First off: you are looking at the string representation of the result, not the actual result itself. If you really want to compare the two results, you should format both results explicitly, using String#% and you should format both results the same way.
Secondly, that's just how binary floating point numbers work. They are inexact, they are finite and they are binary. All three mean that you get rounding errors, which generally look totally random, unless you happen to have memorized the entirety of IEEE754 and can recite it backwards in your sleep.
There is no floating point number exactly equal to 129.95. So your language uses a value which is close to it instead. When that value is multiplied by 100, the result is close to 12995, but it just so happens to not equal 12995. (It is also not exactly equal to 100 times the original value it used in place of 129.95.) So your interpreter prints a decimal number which is close to (but not equal to) the value of 129.95 * 100 and which shows you that it is not exactly 12995. It also just so happens that the result 129.95 * 10 is exactly equal to 1299.5. This is mostly luck.
Bottom line is, never expect equality out of any floating point arithmetic, only "closeness".

double rounding

I this is code that rounds 62 to 61 and shows it in the output. Why it decides to round and how to get 62 in the output?
var d: double;
i: integer;
begin
d:=0.62;
i:= trunc(d*100);
Showmessage( inttostr(i) );
end;
This boils down to the fact that 0.62 is not exactly representable in a binary floating point data type. The closest representable double value to 0.62 is:
0.61999 99999 99999 99555 91079 01499 37383 83054 73327 63671 875
When you multiply this value by 100, the resulting value is slightly less that 62. What happens next depends on how the intermediate value d*100 is treated. In your program, under the 32 bit Windows compiler with default settings, the intermediate value is held in an 80 bit extended register. And the closest 80 bit extended precision value is:
61.99999 99999 99999 55591 07901 49937 38383 05473 32763 67187 5
Since the value is less than 62, Trunc returns 61 since Trunc rounds towards zero.
If you stored d*100 in a double value, then you'd see a different result.
d := 0.62;
d := d*100;
i := Trunc(d);
Writeln(i);
This program outputs 62 rather than 61. That's because although d*100 to extended 80 bit precision is less than 62, the closest double precision value to that 80 bit value is in fact 62.
Similarly, if you compile your original program with the 64 bit compiler, then arithmetic is performed in the SSE unit which has no 80 bit registers. And so there is no 80 bit intermediate value and your program outputs 62.
Or, going back to the 32 bit compiler, you can arrange that intermediate values are stored to 64 bit precision on the FPU and also achieve an output of 62. Call Set8087CW($1232) to achieve that.
As you can see, binary floating point arithmetic can sometimes be surprising.
If you use Round rather than Trunc then the value returned will be the closest integer, rather than rounding towards zero as Trunc does.
But perhaps a better solution would be to use a decimal data type rather than a binary data type. If you do that then you can represent 0.62 exactly and thereby avoid all such problems. Delphi's built in decimal real valued data type is Currency.
Use round instead of trunc.
round will round towards the nearest integer, and 62.00 is very close to 62, so there is no problem. trunc will round to the nearest integer towards zero, and 62.00 is very close to 61.9999999 so numeric 'fuzz' might very well cause the issue you describe.

Objective C ceil returns wrong value

NSLog(#"CEIL %f",ceil(2/3));
should return 1. However, it shows:
CEIL 0.000000
Why and how to fix that problem? I use ceil([myNSArray count]/3) and it returns 0 when array count is 2.
The same rules as C apply: 2 and 3 are ints, so 2/3 is an integer divide. Integer division truncates so 2/3 produces the integer 0. That integer 0 will then be cast to a double precision float for the call to ceil, but ceil(0) is 0.
Changing the code to:
NSLog(#"CEIL %f",ceil(2.0/3.0));
Will display the result you're expecting. Adding the decimal point causes the constants to be recognised as double precision floating point numbers (and 2.0f is how you'd type a single precision floating point number).
Maudicus' solution works because (float)2/3 casts the integer 2 to a float and C's promotion rules mean that it'll promote the denominator to floating point in order to divide a floating point number by an integer, giving a floating point result.
So, your current statement ceil([myNSArray count]/3) should be changed to either:
([myNSArray count] + 2)/3 // no floating point involved
Or:
ceil((float)[myNSArray count]/3) // arguably more explicit
2/3 evaluates to 0 unless you cast it to a float.
So, you have to be careful with your values being turned to int's before you want.
float decValue = (float) 2/3;
NSLog(#"CEIL %f",ceil(decValue));
==>
CEIL 1.000000
For you array example
float decValue = (float) [myNSArray count]/3;
NSLog(#"CEIL %f",ceil(decValue));
It probably evaluates 2 and 3 as integers (as they are, obviously), evaluates the result (which is 0), and then converts it to float or double (which is also 0.00000). The easiest way to fix it is to type either 2.0f/3, 2/3.0f, or 2.0f/3.0f, (or without "f" if you wish, whatever you like more ;) ).
Hope it helps

Small numbers in Objective C 2.0

I created a calculator class that does basic +,-, %, * and sin, cos, tan, sqrt and other math functions.
I have all the variables of type double, everything is working fine for big numbers, so I can calculate numbers like 1.35E122, but the problem is with extremely small numbers. For example if I do calculation 1/98556321 I get 0 where I would like to get something 1.01464E-8.
Should I rewrite my code so that I only manipulate NSDecimalNumber's and if so, what do I do with sin and cos math functions that accept only double and long double values.
1/98556321
This division gives you 0 because integer division is performed here - the result is an integer part of division. The following line should give you floating point result:
1/(double)98556321
integer/integer is always an integer
So either you convert the upper or the lower number to decimal
(double)1/98556321
or
1/(double)98556321
Which explicitely convert the number to double.
Happy coding....

Resources