Converting NSString to Float - Adds in Decimal Places? - ios

I am parsing some vertice information from an XML file which reads as follows (partial extract) :
21081.7 23447.6 2781.62 24207.4 18697.3 -2196.96
I save the string as an NSString and then convert to a float value (which I will later feed into OpenGL ES)
NSString * xPoint = [finishedParsingArray objectAtIndex:baseIndex];
NSLog(#"xPoiint is %#", xPoint);
float x = [xPoint floatValue];
The problem is that float x changes the values as follows :
21081.699219, 23447.599609, 2781.620117, 24207.400391, 18697.300781, -2196.959961
As you can see, it is changing the number of decimal places (not sure how it is doing this - must be hidden formatting in the xml file ?)
My question is how can I store the float to match the original number in the NSString / XML file to the same number of decimal places ?
Thanks in advance !

Your issue seems to be that you don't understand how floats are stored in memory and don't know that floats aren't precise.
Exact values often can't be stored and so the system picks the closest number it can to represent it. If you look carefully, you can see that each of the outputted numbers is very close to your inputted values.
For better accuracy, try using double instead. Double does encounter the same problems, but with better precision. Floats have about 6 significant digits; doubles have more than twice that. Source
Here are some other StackOverflow answers and external articles you should read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Floating Points on Wikipedia
This answer on a similar question

All primitives which store floating point numbers have an accuracy issue. Most of the time it's so small it doesn't matter, but sometimes it's vital.
When it's important to keep the exact number, I would suggest using NSDecimalNumber.

Related

Outputting values from CAMPARY

I'm trying to use the CAMPARY library (CudA Multiple Precision ARithmetic librarY). I've downloaded the code and included it in my project. Since it supports both cpu and gpu, I'm starting with cpu to understand how it works and make sure it does what I need. But the intent is to use this with CUDA.
I'm able to instantiate an instance and assign a value, but I can't figure out how to get things back out. Consider:
#include <time.h>
#include "c:\\vss\\CAMPARY\\Doubles\\src_cpu\\multi_prec.h"
int main()
{
const char *value = "123456789012345678901234567";
multi_prec<2> a(value);
a.prettyPrint();
a.prettyPrintBin();
a.prettyPrintBin_UnevalSum();
char *cc = a.prettyPrintBF();
printf("\n%s\n", cc);
free(cc);
}
Compiles, links, runs (VS 2017). But the output is pretty unhelpful:
Prec = 2
Data[0] = 1.234568e+26
Data[1] = 7.486371e+08
Prec = 2
Data[0] = 0x1.987bf7c563caap+86;
Data[1] = 0x1.64fa5c3800000p+29;
0x1.987bf7c563caap+86 + 0x1.64fa5c3800000p+29;
1.234568e+26 7.486371e+08
Printing each of the doubles like this might be easy to do, but it doesn't tell you much about the value of the 128 number being stored. Performing highly accurate computations is of limited value if there's no way to output the results.
In addition to just printing out the value, eventually I also need to convert these numbers to ints (I'm willing to try it all in floats if there's a way to print, but I fear that both accuracy and speed will suffer). Unlike MPIR (which doesn't support CUDA), CAMPARY doesn't have any associated multi-precision int type, just floats. I can probably cobble together what I need (mostly just add/subtract/compare), but only if I can get the integer portion of CAMPARY's values back out, which I don't see a way to do.
CAMPARY doesn't seem to have any docs, so it's conceivable these capabilities are there, and I've simply overlooked them. And I'd rather ask on the CAMPARY discussion forum/mail list, but there doesn't seem to be one. That's why I'm asking here.
To sum up:
Is there any way to output the 128bit ( multi_prec<2> ) values from CAMPARY?
Is there any way to extract the integer portion from a CAMPARY multi_prec? Perhaps one of the (many) math functions in the library that I don't understand computes this?
There are really only 2 possible answers to this question:
There's another (better) multi-precision library that works on CUDA that does what you need.
Here's how to modify this library to do what you need.
The only people who could give the first answer are CUDA programmers. Unfortunately, if there were such a library, I feel confident talonmies would have known about it and mentioned it.
As for #2, why would anyone update this library if they weren't a CUDA programmer? There are other, much better multi-precision libraries out there. The ONLY benefit CAMPARY offers is that it supports CUDA. Which means the only people with any real motivation to work with or modify the library are CUDA programmers.
And, as the CUDA programmer with the most vested interest in solving this, I did figure out a solution (albeit an ugly one). I'm posting it here in the hopes that the information will be of value to future CAMPARY programmers. There's not much information out there for this library, so this is a start.
The first thing you need to understand is how CAMPARY stores its data. And, while not complex, it isn't what I expected. Coming from MPIR, I assumed that CAMPARY stored its data pretty much the same way: a fixed size exponent followed by an arbitrary number of bits for the mantissa.
But nope, CAMPARY went a different way. Looking at the code, we see:
private:
double data[prec];
Now, I assumed that this was just an arbitrary way of reserving the number of bits they needed. But no, they really do use prec doubles. Like so:
multi_prec<8> a("2633716138033644471646729489243748530829179225072491799768019505671233074369063908765111461703117249");
// Looking at a in the VS debugger:
[0] 2.6337161380336443e+99 const double
[1] 1.8496577979210756e+83 const double
[2] 1.2618399223120249e+67 const double
[3] -3.5978270144026257e+48 const double
[4] -1.1764513205926450e+32 const double
[5] -2479038053160511.0 const double
[6] 0.00000000000000000 const double
[7] 0.00000000000000000 const double
So, what they are doing is storing the max amount of precision possible in the first double, then the remainder is used to compute the next double and so on until they encompass the entire value, or run out of precision (dropping the least significant bits). Note that some of these are negative, which means the sum of the preceding values is a bit bigger than the actual value and they are correcting it downward.
With this in mind, we return to the question of how to print it.
In theory, you could just add all these together to get the right answer. But kinda by definition, we already know that C doesn't have a datatype to hold a value this size. But other libraries do (say MPIR). Now, MPIR doesn't work on CUDA, but it doesn't need to. You don't want to have your CUDA code printing out data. That's something you should be doing from the host anyway. So do the computations with the full power of CUDA, cudaMemcpy the results back, then use MPIR to print them out:
#define MPREC 8
void ShowP(const multi_prec<MPREC> value)
{
multi_prec<MPREC> temp(value), temp2;
// from mpir at mpir.org
mpf_t mp, mp2;
mpf_init2(mp, value.getPrec() * 64); // Make sure we reserve enough room
mpf_init(mp2); // Only needs to hold one double.
const double *ptr = value.getData();
mpf_set_d(mp, ptr[0]);
for (int x = 1; x < value.getPrec(); x++)
{
// MPIR doesn't have a mpf_add_d, so we need to load the value into
// an mpf_t.
mpf_set_d(mp2, ptr[x]);
mpf_add(mp, mp, mp2);
}
// Using base 10, write the full precision (0) of mp, to stdout.
mpf_out_str(stdout, 10, 0, mp);
mpf_clears(mp, mp2, NULL);
}
Used with the number stored in the multi_prec above, this outputs the exact same value. Yay.
It's not a particularly elegant solution. Having to add a second library just to print a value from the first is clearly sub-optimal. And this conversion can't be all that speedy either. But printing is typically done (much) less frequently than computing. If you do an hour's worth of computing and a handful of prints, the performance doesn't much matter. And it beats the heck out of not being able to print at all.
CAMPARY has a lot of shortcomings (undoced, unsupported, unmaintained). But for people who need mp numbers on CUDA (especially if you need sqrt), it's the best option I've found.

Create a JSON string with number of significant figures / decimal places based on key IOS OBJ C

I need to upload JSON data from an app (IOS) to the backend server.
The goal is to optimise the size of the upload packet which is JSON encoded as a NSString. The string is currently about 5MB but contains mostly doubles which have more precision than necessary.
The size of the packet can be reduced by around 40-50% by removing unnecessary decimal places in doubles. This has to be customisable based on the key.
What is the best way to create a JSON string with different numbers of significant figures or decimal places depending on the key.
You may need to do some experiments. Let's say you want to send data with two decimal digits, like 3.14 instead of pi. You know you have to turn all numbers into NSNumber. You would turn x into a number with two decimals by writing
double asDouble = 3.141592653;
NSNumber* asNumber = #(round (asDouble * 100.0) / 100.0);
However, you need to check that this always works; with some bad luck this could send 3.140000000000000000000001 to your server.
Obviously you can replace the 100.0 with 1000.0 etc. Do not replace the division with a multiplication by 0.01 because that will increase rounding errors and the chance that you get tons of decimal digits.
You might check what happens if you write
NSNumber* asNumber = #((float) asDouble);
If NSJSONSerialization is clever enough, it will send fewer decimals.

Why 0.9 is equal to 0.89999997615814208? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
I am reading from a txt file and populating a core data entity.
At some point I have read the value form the TXT file and the value is #"0.9".
Now I assign it to a CGFloat.
CGFloat value = (CGFloat)[stringValue floatValue];
debugger shows value as 0.89999997615814208 !!!!!!?????
why? bug? Even if it things [stringValue floatValue] is a double, casting it to CGFloat should not produce that abnormality.
The binary floating point representation used for float can't store the value exactly. So it uses the closest representable value.
It's similar to decimal numbers: It's impossible to represent one third in decimal (instead we use an approximate representation like 0.3333333).
Because to store a float in binary you can only approximate it by summing up fractions like 1/2, 1/4, 1/8 etc. For 0.9 (any many other values) there is no exact representation that can be constructed from summing fractions like this. Whereas if the value was say, 0.25 you could represent that exactly as 1/4.
Floating point imprecision, check out http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Basically has to do with how floating points work, they don't just store a number, they store a math problem. Broken down by a base number, and a precision. It then must combine the two numbers via math operation to retrieve the actual value, which doesn't always turn out to be exactly what you assigned to it.

Ruby Floating Point Math - Issue with Precision in Sum Calc

Good morning all,
I'm having some issues with floating point math, and have gotten totally lost in ".to_f"'s, "*100"'s and ".0"'s!
I was hoping someone could help me with my specific problem, and also explain exactly why their solution works so that I understand this for next time.
My program needs to do two things:
Sum a list of decimals, determine if they sum to exactly 1.0
Determine a difference between 1.0 and a sum of numbers - set the value of a variable to the exact difference to make the sum equal 1.0.
For example:
[0.28, 0.55, 0.17] -> should sum to 1.0, however I keep getting 1.xxxxxx. I am implementing the sum in the following fashion:
sum = array.inject(0.0){|sum,x| sum+ (x*100)} / 100
The reason I need this functionality is that I'm reading in a set of decimals that come from excel. They are not 100% precise (they are lacking some decimal points) so the sum usually comes out of 0.999999xxxxx or 1.000xxxxx. For example, I will get values like the following:
0.568887955,0.070564759,0.360547286
To fix this, I am ok taking the sum of the first n-1 numbers, and then changing the final number slightly so that all of the numbers together sum to 1.0 (must meet validation using the equation above, or whatever I end up with). I'm currently implementing this as follows:
sum = 0.0
array.each do |item|
sum += item * 100.0
end
array[i] = (100 - sum.round)/100.0
I know I could do this with inject, but was trying to play with it to see what works. I think this is generally working (from inspecting the output), but it doesn't always meet the validation sum above. So if need be I can adjust this one as well. Note that I only need two decimal precision in these numbers - i.e. 0.56 not 0.5623225. I can either round them down at time of presentation, or during this calculation... It doesn't matter to me.
Thank you VERY MUCH for your help!
If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.
It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).
When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.
require "bigdecimal"
BigDecimal("0.9987")
That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.
If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.
1 - array.map{|v| BigDecimal(v)}.reduce(:+)
Either:
continue using floats and round(2) your totals: 12.341.round(2) # => 12.34
use integers (i.e. cents instead of dollars)
use BigDecimal and you won't need to round after summing them, as long as you start with BigDecimal with only two decimals.
I think that algorithms have a great deal more to do with accuracy and precision than a choice of IEEE floating point over another representation.
People used to do some fine calculations while still dealing with accuracy and precision issues. They'd do it by managing the algorithms they'd use and understanding how to represent functions more deeply. I think that you might be making a mistake by throwing aside that better understanding and assuming that another representation is the solution.
For example, no polynomial representation of a function will deal with an asymptote or singularity properly.
Don't discard floating point so quickly. I could be that being smarter about the way you use them will do just fine.

Converting `NSDecimalNumber` to `SInt64` without precision loss. (within range of SInt64, iOS)

Currently, [NSDecimalNumber longLongValue] created with string #"9999999999999999" returns 10000000000000000.
This means the class converts it's value to double first, and re-converts into SInt64(signed long long)
How to evade this behavior? I want to get precise integral number within the range of SInt64.
PS.
I considered about converting to NSString and re-converting into SInt64 with NSScanner or strtoll, but I believe there's better way. But if you sure about there's no other way, please tell me that.
First: unless you're sure it's performance-critical, I'd write it into a string and scan it back. That's the easy way.
Now, if you really want to do it otherwise:
get an NSDecimal from your NSDecimalNumber
work with the private fields of the structure, initialize your long long value from the mantissa (possibly introduce checks to handle too-large mantissas)
multiply by 10^exponent; you can do that using binary exponentiation; again, check for overflow
Start with an NSDecimalNumber* originalValue.
Let int64_t approx = [originalValue longLongValue]. This will not be exact, but quite close.
Convert approx to NSDecimalNumber, calculate originalValue - approx, take the longLongValue, and add to approx. Now you got the correct result.

Resources