(La)TeX Base 10 fixed point arithmetic - latex

I'm trying to implement decimal arithmetic in (La)TeX. I'm trying to use dimens to store the values. I want the arithmetic to be exact to some (fixed) number of decimal places. If I use 1pt as my base unit, then this fails, because \divide rounds down, so 1pt / 10 gives 0.09999pt. If I use something like 1000sp as my base unit, then I get working fixed point arithmetic with 3 decimal places, but I can't figure out an easy way to format the numbers. If I try to convert them to pt, so I can use TeX's display mechanism, I have the same problem with \divide.
How do I fix this problem, or work around it?

The fp package provides fixed point arithmetic for LaTeX. The LaTeX3 Project are currently implementing something similar as part of the expl3 bundle. The code is currently not on CTAN, but can be grabbed from the SVN (or will appear when the next update from the SVN to CTAN takes place).

I would represent all the values as integers and scale them appropriately. For example, when you need three decimal digits, 0.124 would be represented as 124. This is nice because addition and subtraction are trivial. When multiplying two numbers a and b, you would have to divide the result by 1000 to get the proper representation. Dividing works by multiplying the result with 1000.
You still have to get the rounding issues correct, but this isn't very difficult. At least if you don't get near the maximum representable integer (I don't remember if it's 2^31-1 or 2^30-1).
Here is some code:
\def\fixadd#1#2#3{%
#1=#2\relax
\advance #1 by #3\relax
}
\def\fixsub#1#2#3{%
#1=#2\relax
#1=-#1\relax
\advance #1 by #3\relax
#1=-#1\relax
}
\def\fixmul#1#2#3{%
#1=#2\relax
\multiply #1 by #3\relax
\divide #1 by 1000\relax
}
\def\fixdiv#1#2#3{%
#1=#2\relax
\divide #1 by #3\relax
\multiply #1 by 1000\relax
}
\newcount\numa
\newcount\numb
\newcount\numc
\numa=1414
\numb=2828
\fixmul\numc\numa\numb
\the\numc
\bye
The operations are modeled after a three register machine, where the first is the destination and the other two are the operands. The rounding after the multiplication and division, including corner cases for very large or very small numbers are left as an exercise to you.

Related

Google Sheet yields infinitesimal number as remainder of an integer/whole number

I have this worksheet where I need to create a checker to determine whether a number (result of dividing the sum of two numbers by another value --DIVISOR) is an integer/does not have decimals. Upon running the said checker, it mostly worked just fine but appeared to detect that a few items are not integers despite being exact multiples of the DIVISOR.
https://docs.google.com/spreadsheets/d/17-idS5G0kUI7JoHAx3qcJOiJ-zofmMrg93hUvZuxPiA/edit#gid=0
I have two values (V1 and V2) whose sum I need to divide by a certain number (Divisor).
I need the OUTPUT to be an integer/whole number. Since the DIVISOR is a multiple of SUM (V1,V2), the OUTPUT is supposed to be a whole number. I also expanded the number of decimal places to make sure that there are no trailing numbers after the decimal point.
However, upon running the MOD function over the OUTPUT, it generated some infinitesimal value.
I also tried TRUNCATING the OUTPUT and getting the DIFFERENCE between the TRUNC and OUTPUT. It yielded the same remainder value as the MOD result.
I downloaded the GSheet and opened it in MS Excel. There seems to be no problem with the DIFFERENCE result, but the MOD function yielded yet another value.
actually, this is not a bug and it is pretty common. its called a floating point "error" and in a nutshell, it has to do things with how decimal numbers are stored within a google sheets (even excel or any other app)
more details can be found here: https://en.wikipedia.org/wiki/IEEE_754
to counter it you will need to introduce rounding like:
=ROUND(SUM(A1:A))
this is not an ideal solution for all cases so depending on your requirements you may need to use these instead of ROUND:
ROUNDUP
ROUNDDOWN
TRUNC
TEXT

Neo4j floating point sum different results

I am using neo4j to calculate some statistics on a data set. For that I am often using sum on a floating point value. I am getting different results depending on the circumstances. For example, a query that does this:
...
WITH foo
ORDER BY foo.fooId
RETURN SUM(foo.Weight)
Returns different result than the query that simply does the sum:
...
RETURN SUM(foo.Weight)
The differences are miniscule (293.07724195098984 vs 293.07724195099007). But it is enough to make simple equality checks fail. Another example would be a different instance of the database, loaded with the same data using the same loading process can produce the same issue (the dbs might not be 1:1, the load order of some relations might be different). I took the raw values that neo4j sums (by simply removing the SUM()) and verified that they are the same in all cases (different dbs and ordered/not ordered).
What are my options here? I don't mind losing some precision (I already tried to cut down the precision from 15 to 12 decimal places but that did not seem to work), but I need the results to match up.
Because of rounding errors, floats are not associative. (a+b)+c!=a+(b+c).
The result of every operation is rounded to fit the floats coding constraints and (a+b)+c is implemented as round(round(a+b) +c) while a+(b+c) as round(a+round(b+c)).
As an obvious illustration, consider the operation (2^-100 + 1 -1). If interpreted as a (2^-100 + 1)-1, it will return 0, as 1+2^-100 would require a precision too large for floats or double coding in IEEE754 and can only be coded as 1.0. While (2^-100 +(1-1)) correctly returns 2^-100 that can be coded by either floats or doubles.
This is a trivial example, but these rounding errors may exist after every operation and explain why floating point operations are not associative.
Databases generally do not return data in a garanteed order and depending on the actual order, operations will be done differently and that explains the behaviour that you have.
In general, for this reason, it not a good idea to do equality comparison on floats. Generally, it is advised to replace a==b by abs(a-b) is "sufficiently" small.
"sufficiently" may depend on your algorithm. float are equivalent to ~6-7 decimals and doubles to 15-16 decimals (and I think that it is what is used on your DB). Depending on the number of computations, you may have the last 1--3 decimals affected.
The best is probably to use
abs(a-b)<relative-error*max(abs(a),abs(b))
where relative-error must be adjusted to your problem. Probably something around 10^-13 can be correct, but you must experiment, as rounding errors depends on the number of computations, on the dispersion of the values and on what you may consider as "equal" for you problem.
Look at this site for a discussion on comparison methods. And read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg that discusses, among others, these problems.

Ruby Floating Point Math - Issue with Precision in Sum Calc

Good morning all,
I'm having some issues with floating point math, and have gotten totally lost in ".to_f"'s, "*100"'s and ".0"'s!
I was hoping someone could help me with my specific problem, and also explain exactly why their solution works so that I understand this for next time.
My program needs to do two things:
Sum a list of decimals, determine if they sum to exactly 1.0
Determine a difference between 1.0 and a sum of numbers - set the value of a variable to the exact difference to make the sum equal 1.0.
For example:
[0.28, 0.55, 0.17] -> should sum to 1.0, however I keep getting 1.xxxxxx. I am implementing the sum in the following fashion:
sum = array.inject(0.0){|sum,x| sum+ (x*100)} / 100
The reason I need this functionality is that I'm reading in a set of decimals that come from excel. They are not 100% precise (they are lacking some decimal points) so the sum usually comes out of 0.999999xxxxx or 1.000xxxxx. For example, I will get values like the following:
0.568887955,0.070564759,0.360547286
To fix this, I am ok taking the sum of the first n-1 numbers, and then changing the final number slightly so that all of the numbers together sum to 1.0 (must meet validation using the equation above, or whatever I end up with). I'm currently implementing this as follows:
sum = 0.0
array.each do |item|
sum += item * 100.0
end
array[i] = (100 - sum.round)/100.0
I know I could do this with inject, but was trying to play with it to see what works. I think this is generally working (from inspecting the output), but it doesn't always meet the validation sum above. So if need be I can adjust this one as well. Note that I only need two decimal precision in these numbers - i.e. 0.56 not 0.5623225. I can either round them down at time of presentation, or during this calculation... It doesn't matter to me.
Thank you VERY MUCH for your help!
If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.
It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).
When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.
require "bigdecimal"
BigDecimal("0.9987")
That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.
If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.
1 - array.map{|v| BigDecimal(v)}.reduce(:+)
Either:
continue using floats and round(2) your totals: 12.341.round(2) # => 12.34
use integers (i.e. cents instead of dollars)
use BigDecimal and you won't need to round after summing them, as long as you start with BigDecimal with only two decimals.
I think that algorithms have a great deal more to do with accuracy and precision than a choice of IEEE floating point over another representation.
People used to do some fine calculations while still dealing with accuracy and precision issues. They'd do it by managing the algorithms they'd use and understanding how to represent functions more deeply. I think that you might be making a mistake by throwing aside that better understanding and assuming that another representation is the solution.
For example, no polynomial representation of a function will deal with an asymptote or singularity properly.
Don't discard floating point so quickly. I could be that being smarter about the way you use them will do just fine.

Project Euler -Prob. #20 (Lua)

http://projecteuler.net/problem=20
I've written code to figure out this problem, however, it seems to be accurate in some cases, and inaccurate in others. When I try solving the problem to 10 (answer is given in question, 27) I get 27, the correct answer. However, when I try solving the question given (100) I get 64, the incorrect answer, as the answer is something else.
Here's my code:
function factorial(num)
if num>=1 then
return num*factorial(num-1)
else
return 1
end
end
function getSumDigits(str)
str=string.format("%18.0f",str):gsub(" ","")
local sum=0
for i=1,#str do
sum=sum+tonumber(str:sub(i,i))
end
return sum
end
print(getSumDigits(tostring(factorial(100))))
64
Since Lua converts large numbers into scientific notation, I had to convert it back to standard notation. I don't think this is a problem, though it might be.
Is there any explanation to this?
Unfortunately, the correct solution is more difficult. The main problem here is that Lua uses 64bit floating point variables, which means this applies.
Long story told short: The number of significant digits in a 64bit float is much too small to store a number like 100!. Lua's floats can store a maximum of 52 mantissa bits, so any number greater than 2^52 will inevitably suffer from rounding errors, which gives you a little over 15 decimal digits. To store 100!, you'll need at least 158 decimal digits.
The number calculated by your factorial() function is reasonably close to the real value of 100! (i.e. the relative error is small), but you need the exact value to get the right solution.
What you need to do is implement your own algorithms for dealing with large numbers. I actually solved that problem in Lua by storing each number as a table, where each entry stores one digit of a decimal number. The complete solution takes a little more than 50 lines of code, so it's not too difficult and a nice exercise.

Why does this code causes the machine to crash?

I am trying to run this code but it keeps crashing:
log10(x):=log(x)/log(10);
char(x):=floor(log10(x))+1;
mantissa(x):=x/10**char(x);
chop(x,d):=(10**char(x))*(floor(mantissa(x)*(10**d))/(10**d));
rnd(x,d):=chop(x+5*10**(char(x)-d-1),d);
d:5;
a:10;
Ibwd:[[30,rnd(integrate((x**60)/(1+10*x^2),x,0,1),d)]];
for n from 30 thru 1 step -1 do Ibwd:append([[n-1,rnd(1/(2*n-1)-a*last(first(Ibwd)),d)]],Ibwd);
Maxima crashes when it evaluates the last line. Any ideas why it may happen?
Thank you so much.
The problem is that the difference becomes negative and your rounding function dies horribly with a negative argument. To find this out, I changed your loop to:
for n from 30 thru 1 step -1 do
block([],
print (1/(2*n-1)-a*last(first(Ibwd))),
print (a*last(first(Ibwd))),
Ibwd: append([[n-1,rnd(1/(2*n-1)-a*last(first(Ibwd)),d)]],Ibwd),
print (Ibwd));
The last difference printed before everything fails miserably is -316539/6125000. So now try
rnd(-1,3)
and see the same problem. This all stems from the fact that you're taking the log of a negative number, which Maxima interprets as a complex number by analytic continuation. Maxima doesn't evaluate this until it absolutely has to and, somewhere in the evaluation code, something's dying horribly.
I don't know the "fix" for your specific example, since I'm not exactly sure what you're trying to do, but hopefully this gives you enough info to find it yourself.
If you want to deconstruct a floating point number, let's first make sure that it is a bigfloat.
say z: 34.1
You can access the parts of a bigfloat by using lisp, and you can also access the mantissa length in bits by ?fpprec.
Thus ?second(z)*2^(?third(z)-?fpprec) gives you :
4799148352916685/140737488355328
and bfloat(%) gives you :
3.41b1.
If you want the mantissa of z as an integer, look at ?second(z)
Now I am not sure what it is that you are trying to accomplish in base 10, but Maxima
does not do internal arithmetic in base 10.
If you want more bits or fewer, you can set fpprec,
which is linked to ?fpprec. fpprec is the "approximate base 10" precision.
Thus fpprec is initially 16
?fpprec is correspondingly 56.
You can easily change them both, e.g. fpprec:100
corresponds to ?fpprec of 335.
If you are diddling around with float representations, you might benefit from knowing
that you can look at any of the lisp by typing, for example,
?print(z)
which prints the internal form using the Lisp print function.
You can also trace any function, your own or system function, by trace.
For example you could consider doing this:
trace(append,rnd,integrate);
If you want to use machine floats, I suggest you use, for the last line,
for n from 30 thru 1 step -1 do :
Ibwd:append([[n-1,rnd(1/(2.0*n- 1.0)-a*last(first(Ibwd)),d)]],Ibwd);
Note the decimal points. But even that is not quite enough, because integration
inserts exact structures like atan(10). Trying to round these things, or compute log
of them is probably not what you want to do. I suspect that Maxima is unhappy because log is given some messy expression that turns out to be negative, even though it initially thought otherwise. It hands the number to the lisp log program which is perfectly happy to return an appropriate common-lisp complex number object. Unfortunately, most of Maxima was written BEFORE LISP HAD COMPLEX NUMBERS.
Thus the result (log -0.5)= #C(-0.6931472 3.1415927) is entirely unexpected to the rest of Maxima. Maxima has its own form for complex numbers, e.g. 3+4*%i.
In particular, the Maxima display program predates the common lisp complex number format and does not know what to do with it.
The error (stack overflow !!!) is from the display program trying to display a common lisp complex number.
How to fix all this? Well, you could try changing your program so it computes what you really want, in which case it probably won't trigger this error. Maxima's display program should be fixed, too. Also, I suspect there is something unfortunate in simplification of logs of numbers that are negative but not obviously so.
This is probably waaay too much information for the original poster, but maybe the paragraph above will help out and also possibly improve Maxima in one or more places.
It appears that your program triggers an error in Maxima's simplification (algebraic identities) code. We are investigating and I hope we have a bug fix soon.
In the meantime, here is an idea. Looks like the bug is triggered by rnd(x, d) when x < 0. I guess rnd is supposed to round x to d digits. To handle x < 0, try this:
rnd(x, d) := if x < 0 then -rnd1(-x, d) else rnd1(x, d);
rnd1(x, d) := (... put the present definition of rnd here ...);
When I do that, the loop runs to completion and Ibwd is a list of values, but I don't know what values to expect.

Resources