Unexpected result subtracting decimals in ruby [duplicate] - ruby-on-rails

Can somebody explain why multiplying by 100 here gives a less accurate result but multiplying by 10 twice gives a more accurate result?
± % sc
Loading development environment (Rails 3.0.1)
>> 129.95 * 100
12994.999999999998
>> 129.95*10
1299.5
>> 129.95*10*10
12995.0

If you do the calculations by hand in double-precision binary, which is limited to 53 significant bits, you'll see what's going on:
129.95 = 1.0000001111100110011001100110011001100110011001100110 x 2^7
129.95*100 = 1.1001011000010111111111111111111111111111111111111111011 x 2^13
This is 56 significant bits long, so rounded to 53 bits it's
1.1001011000010111111111111111111111111111111111111111 x 2^13, which equals
12994.999999999998181010596454143524169921875
Now 129.95*10 = 1.01000100110111111111111111111111111111111111111111111 x 2^10
This is 54 significant bits long, so rounded to 53 bits it's 1.01000100111 x 2^10 = 1299.5
Now 1299.5 * 10 = 1.1001011000011 x 2^13 = 12995.

First off: you are looking at the string representation of the result, not the actual result itself. If you really want to compare the two results, you should format both results explicitly, using String#% and you should format both results the same way.
Secondly, that's just how binary floating point numbers work. They are inexact, they are finite and they are binary. All three mean that you get rounding errors, which generally look totally random, unless you happen to have memorized the entirety of IEEE754 and can recite it backwards in your sleep.

There is no floating point number exactly equal to 129.95. So your language uses a value which is close to it instead. When that value is multiplied by 100, the result is close to 12995, but it just so happens to not equal 12995. (It is also not exactly equal to 100 times the original value it used in place of 129.95.) So your interpreter prints a decimal number which is close to (but not equal to) the value of 129.95 * 100 and which shows you that it is not exactly 12995. It also just so happens that the result 129.95 * 10 is exactly equal to 1299.5. This is mostly luck.
Bottom line is, never expect equality out of any floating point arithmetic, only "closeness".

Related

Why multiply two double in dart result in very strange number

Can anyone explain why the result is 252.99999999999997 and not 253? What should be used instead to get 253?
double x = 2.11;
double y = 0.42;
print(((x + y) * 100)); // print 252.99999999999997
I am basically trying to convert a currency value with 2 decimal (ie £2.11) into pence/cent (ie 211p)
Thanks
In short: Because many fractional double values are not precise, and adding imprecise values can give even more imprecise results. That's an inherent property of IEEE-754 floating point numbers, which is what Dart (and most other languages and the CPUs running them) are working with.
Neither of the rational numbers 2.11 and 0.42 are precisely representable as a double value. When you write 2.11 as source code, the meaning of that is the actual double values that is closest to the mathematical number 2.11.
The value of 2.11 is precisely 2.109999999999999875655021241982467472553253173828125.
The value of 0.42 is precisely 0.419999999999999984456877655247808434069156646728515625.
As you can see, both are slightly smaller than the value you intended.
Then you add those two values, which gives the precise double result 2.529999999999999804600747665972448885440826416015625. This loses a few of the last digits of the 0.42 to rounding, and since both were already smaller than 2.11 and 0.42, the result is now even more smaller than 2.53.
Finally you multiply that by 100, which gives the precise result 252.999999999999971578290569595992565155029296875.
This is different from the double value 253.0.
The double.toString method doesn't return a string of the exact value, but it does return different strings for different values, and since the value is different from 253.0, it must return a different string. It then returns a string of the shortest number which is still closer to the result than to the next adjacent double value, and that is the string you see.

sscanf in flex changing value of input

I'm using flex and bison to read in a file that has text but also floating point numbers. Everything seems to be working fine, except that I've noticed that it sometimes changes the values of the numbers. For example,
-4.036 is (sometimes) becoming -4.0359998, and
-3.92 is (sometimes) becoming -3.9200001
The .l file is using the lines
static float fvalue ;
sscanf(specctra_dsn_file_yytext, "%f", &fvalue) ;
The values pass through the yacc parser and arrive at my own .cpp file as floats with the values described. Not all of the values are changed, and even the same value is changed in some occurrences, and unchanged in others.
Please let me know if I should add more information.
float cannot represent every number. It is typically 32-bit and so is limited to at most 232 different numbers. -4.036 and -3.92 are not in that set on your platform.
<float> is typically encoded using IEEE 754 single-precision binary floating-point format: binary32 and rarely encodes fractional decimal values exactly. When assigning values like "-3.92", the actual values saved will be one close to that, but maybe not exact. IOWs, the conversion of -3.92 to float was not exact had it been done by assignment or sscanf().
float x1 = -3.92;
// float has an exact value of -3.9200000762939453125
// View # 6 significant digits -3.92000
// OP reported -3.9200001
float x2 = -4.036;
// float has an exact value of -4.035999774932861328125
// View # 6 significant digits -4.03600
// OP reported -4.0359998
Printing these values to beyond a certain number of significant decimal digits (typically 6 for float) can be expected to not match the original assignment. See Printf width specifier to maintain precision of floating-point value for a deeper C post.
OP could lower expectations of how many digits will match. Alternatively could use double and then only see this problem when typically more than 15 significant decimal digits are viewed.

Large lua numbers are being printed incorrectly

I have the following test case:
Lua 5.3.2 Copyright (C) 1994-2015 Lua.org, PUC-Rio
> foo = 1000000000000000000
> bar = foo + 1
> bar
1000000000000000001
> string.format("%.0f", foo)
1000000000000000000
> string.format("%.0f", bar)
1000000000000000000
That last line should be 1000000000000000001, since that's the value of bar, but for some reason it's not. This doesn't only apply to 1000000000000000000, I've yet to find another number over that one which gives the correct value. Can anyone give an explanation for why this happens?
You're formatting the number as floating-point, not integer. That's what %.0f is doing. At some point, floats lose precision. double, for example, will lose precision after about 16 decimal digits.
If you want to format an integer as an integer, then you need to format it as an integer, using standard printf rules:
string.format("%i", bar)
log2(1000000000000000000) is between 59 and 60, which means that the binary representation of that number needs 60 bits. double-precision floating point numbers have only 53 bits of precision, plus a power-of-two exponent with 11 bits of range. So to store that large of a number as floating point (which is what you requested with the %f format specifier), six to seven bits of precision are chopped off the end of the number, and the whole thing is multiplied by a power of two to get it back in range (259 in this case, I think). Chopping off those final bits removes the precision that allows 1000000000000000000 and 1000000000000000001 to be distinct from each other.
(This is not a particularly precise description of floating point, apologies if my numbers or descriptions are not exact.)

How do I round a positive float up the next integer?

I need to round a positive float upwards to the nearest integer.
examples;
1.0 rounds up to 1
2.1 rounds up to 3
3.5 rounds up to 4
4.9 rounds up to 5
i.e. always round up.
Use the Ceil function from the Math unit. From the documentation:
Rounds variables up toward positive infinity.
Call Ceil (as in ceiling) to obtain the lowest integer greater than or
equal to X. The absolute value of X must be less than MaxInt. For
example:
Ceil(-2.8) = -2
Ceil(2.8) = 3
Ceil(-1.0) = -1
I cannot tell whether or not the behaviour of Ceil meets your expectations for negative input values, because you did not specify what to do there. However, if Ceil does not meet your expectations, it is easy enough to write a function to meet your needs, by combining Abs() and Ceil()
FindField('QTY').ASFLOAT := TRUNC(FindField('QTY').ASFLOAT) + 1
Works Fine

iOS calculating sum of filesizes always negative

I've got a strange problem here, and i'm sure it's just something small.
I recieve information about files via JSON (RestKit is doing a good job).
I write the filesize of each file via coredata to a local store.
Afterwards within one of my viewcontrollers i need to sum up the files-sizes of all files in database. I fetch all files and then going through a slope (for) to sum the size up.
The problem is now, the result is always negative!
The coredata entity filesize is of type Integer 32 (filesize is reported in bytes by JSON).
I read the fetchresult in an NSArray allPublicationsToLoad and then try to sum up. The Objects in the NSArray of Type CDPublication have a value filesize of Type NSNumber:
for(int n = 0; n < [allPublicationsToLoad count]; n = n + 1)
{
CDPublication* thePub = [allPublicationsToLoad objectAtIndex:n];
allPublicationsSize = allPublicationsSize + [[thePub filesize] integerValue];
sum = [NSNumber numberWithFloat:([sum floatValue] + [[thePub filesize] floatValue])];
Each single filesize of the single CDPublications objects are positive and correct. Only the sum of all the filesizes ist negative afterwards. There are around 240 objects right now with filesize-values between 4000 and 234.645.434.123.
Can somebody please give me a hit into the right direction !?
Is it the problem that Integer 32 or NSNumber can't hold such a huge range?
Thanks
MadMaxApp
}
The NSNumber object can't hold such a huge number. Because of the way negative numbers are stored the result is negative.
Negative numbers are stored using two's complement, this is done to make addition of positive and negative numbers easier. The range of numbers NSNumber can hold is split in two, the highest half (the int values for which the highest order bit is equal to 1) is considered to be negative, the lowest half (where the highest order bit is equal to 0) are the normal positive numbers. Now, if you add sufficiently large numbers, the result will be in the highest half and thus be interpreted as a negative number. Here's an illustration for the 4-bit integer situation (32 works exactly the same but there would be a lot more 0 and 1 to type;))
With 4 bits you can represent this range of signed integers:
0000 (=0)
0001 (=1)
0010 (=2)
...
0111 (=7)
1000 (=-8)
1001 (=-7)
...
1111 (=-1)
The maximum positive integer you can represent is 7 in this case. If you would add 5 and 4 for example you would get:
0101 + 0100 = 1001
1001 equals -7 when you represent signed integers like this (and not 9, as you would expect). That's the effect you are observing, but on a much larger scale (32 bits)
Your only option to get correct results in this case is to increase the number of bits used to represent your integers so the result won't be in the negative number range of bit combinations. So if 32 bits is not enough (like in your case), you can use a long (64 bits).
[myNumber longLongValue];
I think this has to do with int overflow: very large integers get reinterpreted as negatives when they overflow the size of int (32 bits). Use longLongValue instead of integerValue:
long long allPublicationsSize = 0;
for(int n = 0; n < [allPublicationsToLoad count]; n++) {
CDPublication* thePub = [allPublicationsToLoad objectAtIndex:n];
allPublicationsSize += [[thePub filesize] longLongValue];
}
This is an integer overflow issue associated with use of two's complement arithmetic. For a 32 bit integer there are exactly 232 (4,294,967,296) possible integer values which can be expressed. When using two's complement, the most significant bit is used as a sign bit which allows half of the numbers to represent non-negative integers (when the sign bit is 0) and the other half to represent negative numbers (when the sign bit is 1). This gives an effective range of [-231, 231-1] or [-2,147,483,648, 2,147,483,647].
To overcome this problem for your case, you should consider using a 64-bit integer. This should work well for the range of values you seem to be interested in using. Alternatively, if even 64-bit is not sufficient, you should look for big integer libraries for iOS.

Resources