AFAIK, Currency type in Delphi Win32 depends on the processor floating point precision. Because of this I'm having rounding problems when comparing two Currency values, returning different results depending on the machine.
For now I'm using the SameValue function passing a Epsilon parameter = 0.009, because I only need 2 decimal digits precision.
Is there any better way to avoid this problem?
The Currency type in Delphi is a 64-bit integer scaled by 1/10,000; in other words, its smallest increment is equivalent to 0.0001. It is not susceptible to precision issues in the same way that floating point code is.
However, if you are multiplying your Currency numbers by floating-point types, or dividing your Currency values, the rounding does need to be worked out one way or the other. The FPU controls this mechanism (it's called the "control word"). The Math unit contains some procedures which control this mechanism: SetRoundMode in particular. You can see the effects in this program:
{$APPTYPE CONSOLE}
uses Math;
var
x: Currency;
y: Currency;
begin
SetRoundMode(rmTruncate);
x := 1;
x := x / 6;
SetRoundMode(rmNearest);
y := 1;
y := y / 6;
Writeln(x = y); // false
Writeln(x - y); // 0.0001; i.e. 0.1666 vs 0.1667
end.
It is possible that a third-party library you are using is setting the control word to a different value. You may want to set the control word (i.e. rounding mode) explicitly at the starting point of your important calculations.
Also, if your calculations ever transfer into plain floating point and then back into Currency, all bets are off - too hard to audit. Make sure all your calculations are in Currency.
No, Currency is not a floating point type. It is a fixed-precision decimal, implemented with integer storage. It can be compared exactly, and does not have the rounding issues of, say, Double. Therefore, if you are seeing inexact values in your Currency variables, the problem is not the Currency type itself, but what you are putting into it. Most likely, you have a floating-point calculation somewhere else in your code. Since you do not show that code, it's hard to be of more help on this question. But the solution, generally speaking, will be to round your floating point numbers to the correct precision before storing in the Currency variable, rather than doing an inexact comparison on the Currency variables.
Faster and safer way of comparing two currency values is certainly to map the variables to their internal Int64 representation:
function CompCurrency(var A,B: currency): Int64;
var A64: Int64 absolute A;
B64: Int64 absolute B;
begin
result := A64-B64;
end;
This will avoid any rounding error during comparison (working with *10000 integer values), and will be faster than the default FPU-based implementation (especially under 64 bit XE2 compiler).
See this article for additional information.
If your situation is like mine, you might find this approach helpful. I work mostly in payroll. If a business has say 3 departments and wants to charge the cost of an employee evenly among those three departments, there are a lot of times when there will be rounding issues.
What I have been doing is loop through the departments charging each one a third of the total cost and adding the cost charged to a subtotal (currency) variable. But when the loop variable equals the limit, rather than multiplying by the fraction, I subtract the subtotal variable from the total cost and put that in the last department. Since the journal entries that result from this process always have to balance, I believe that it has always worked.
See thread:
D7 / DUnit: all CheckEquals(Currency, Currency) tests suddenly fail ...
https://forums.codegear.com/thread.jspa?threadID=16288
It looks like a change on our development workstations caused Currency comparision to fail. We have not found the root cause, but on two computers running Windows 2000 SP4, and independent of the version of gds32.dll (InterBase 7.5.1 or 2007) and Delphi (7 and 2009), this line
TIBDataBase.Create(nil);
changes the value of to 8087 control word from $1372 to $1272 now.
And all Currency comparisions in unit tests will fail with funny messages like
Expected: <12.34> - Found: <12.34>
The gds32.dll has not been modified, so I guess that there is a dependency in this library to a third party dll which modifies the control word.
To avoid possible issues with currency rounding in Delphi use 4 decimal places.
This will ensure that you never having rounding issues when doing calcualtions with very small amounts.
"Been there. Done That. Written the unit tests."
Related
I am using neo4j to calculate some statistics on a data set. For that I am often using sum on a floating point value. I am getting different results depending on the circumstances. For example, a query that does this:
...
WITH foo
ORDER BY foo.fooId
RETURN SUM(foo.Weight)
Returns different result than the query that simply does the sum:
...
RETURN SUM(foo.Weight)
The differences are miniscule (293.07724195098984 vs 293.07724195099007). But it is enough to make simple equality checks fail. Another example would be a different instance of the database, loaded with the same data using the same loading process can produce the same issue (the dbs might not be 1:1, the load order of some relations might be different). I took the raw values that neo4j sums (by simply removing the SUM()) and verified that they are the same in all cases (different dbs and ordered/not ordered).
What are my options here? I don't mind losing some precision (I already tried to cut down the precision from 15 to 12 decimal places but that did not seem to work), but I need the results to match up.
Because of rounding errors, floats are not associative. (a+b)+c!=a+(b+c).
The result of every operation is rounded to fit the floats coding constraints and (a+b)+c is implemented as round(round(a+b) +c) while a+(b+c) as round(a+round(b+c)).
As an obvious illustration, consider the operation (2^-100 + 1 -1). If interpreted as a (2^-100 + 1)-1, it will return 0, as 1+2^-100 would require a precision too large for floats or double coding in IEEE754 and can only be coded as 1.0. While (2^-100 +(1-1)) correctly returns 2^-100 that can be coded by either floats or doubles.
This is a trivial example, but these rounding errors may exist after every operation and explain why floating point operations are not associative.
Databases generally do not return data in a garanteed order and depending on the actual order, operations will be done differently and that explains the behaviour that you have.
In general, for this reason, it not a good idea to do equality comparison on floats. Generally, it is advised to replace a==b by abs(a-b) is "sufficiently" small.
"sufficiently" may depend on your algorithm. float are equivalent to ~6-7 decimals and doubles to 15-16 decimals (and I think that it is what is used on your DB). Depending on the number of computations, you may have the last 1--3 decimals affected.
The best is probably to use
abs(a-b)<relative-error*max(abs(a),abs(b))
where relative-error must be adjusted to your problem. Probably something around 10^-13 can be correct, but you must experiment, as rounding errors depends on the number of computations, on the dispersion of the values and on what you may consider as "equal" for you problem.
Look at this site for a discussion on comparison methods. And read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg that discusses, among others, these problems.
I'm using Delphi XE6 to perform a complicated floating point calculation. I realize the limitations of floating point numbers so understand the inaccuracies inherent in FP numbers. However this particular case, I always get 1 of 2 different values at the end of the calculation.
The first value and after a while (I haven't figured out why and when), it flips to the second value, and then I can't get the first value again unless I restart my application. I can't really be more specific as the calculation is very complicated. I could almost understand if the value was somewhat random, but just 2 different states is a little confusing. This only happens in the 32-bit compiler, the 64 bit compiler gives one single answer no matter how many times I try it. This number is different from the 2 from the 32-bit calculation, but I understand why that is happening and I'm fine with it. I need consistency, not total accuracy.
My one suspicion is that perhaps the FPU is being left in a state after some calculation that affects subsequent calculations, hence my question about clearing all registers and FPU stack to level out the playing field. I'd call this CLEARFPU before I start of the calculation.
After some more investigation I realized I was looking in the wrong place. What you see is not what you get with floating point numbers. I was looking at the string representation of the numbers and thinking here are 4 numbers going into a calculation ALL EQUAL and the result is different. Turns out the numbers only seemed to be the same. I started logging the hex equivalent of the numbers, worked my way back and found an external dll used for matrix multiplication the cause of the error. I replaced the matrix multiplication with a routine written in Delphi and all is well.
Floating point calculations are deterministic. The inputs are the input data and the floating point control word. With the same input, the same calculation will yield repeatable output.
If you have unpredictable results, then there will be a reason for it. Either the input data or the floating point control word is varying. You have to diagnose what that reason for that is. Until you understand the problem fully, you should not be looking for a problem. Do not attempt to apply a sticking plaster without understanding the disease.
So the next step is to isolate and reproduce the problem in a simple piece of code. Once you can reproduce the issue you can solve the problem.
Possible explanations include using uninitialized data, or external code modifying the floating point control word. But there could be other reasons.
Uninitialized data is plausible. Perhaps more likely is that some external code is modifying the floating point control word. Instrument your code to log the floating point control word at various stages of execution, to see if it ever changes unexpectedly.
You've probably been bitten by combination of optimization and excess x87 FPU precision resulting in the same bit of floating-point code in your source code being duplicated with different assembly code implementations with different rounding behaviour.
The problem with x87 FPU math
The basic problem is that while x87 FPU the supports 32-bit, 64-bit and 80-bit floating-point value, it only has 80-bit registers and the precision of operations is determined by the state of the bits in the floating point control word, not the instruction used. Changing the rounding bits is expensive, so most compilers don't, and so all floating point operations end being be performed at the same precision regardless of the data types involved.
So if the compiler sets the FPU to use 80-bit rounding and you add three 64-bit floating point variables, the code generated will often add the first two variables keeping the unrounded result in a 80-bit FPU register. It would then add the third 64-bit variable to 80-bit value in the register resulting in another unrounded 80-bit value in a FPU register. This can result in a different value being calculated than if the result was rounded to 64-bit precision after each step.
If that resulting value is then stored in a 64-bit floating-point variable then the compiler might write it to memory, rounding it to 64 bits at this point. But if the value is used in later floating point calculations then the compiler might keep it in a register instead. This means what rounding occurs at this point depends on the optimizations the compiler performs. The more its able to keep values in a 80-bit FPU register for speed, the more the result will differ from what you'd get if all floating point operation were rounded according to the size of actual floating point types used in the code.
Why SSE floating-point math is better
With 64-bit code the x87 FPU isn't normally used, instead equivalent scalar SSE instructions are used. With these instructions the precision of the operation used is determined by the instruction used. So with the adding three numbers example, the compiler would emit instructions that added the numbers using 64-bit precision. It doesn't matter if the result gets stored in memory or stays in register, the value remains the same, so optimization doesn't affect the result.
How optimization can turn deterministic FP code into non-deterministic FP code
So far this would explain why you'd get a different result with 32-bit code and 64-bit code, but it doesn't explain why you can get a different result with the same 32-bit code. The problem here is that optimizations can change the your code in surprising ways. One thing the compiler can do is duplicate code for various reasons, and this can cause the same floating point code being executed in different code paths with different optimizations applied.
Since optimization can affect floating point results this can mean the different code paths can give different results even though there's only one code path in the source code. If the code path chosen at run time is non-deterministic then this can cause non-deterministic results even when the in the source code the result isn't dependent on any non-deterministic factor.
An example
So for example, consider this loop. It performs a long running calculation, so every few seconds it prints a message letting the user know how many iterations have been completed so far. At the end of the loop there's simple summation performed using floating-point arithmetic. While there's non-deterministic factor in the loop, the floating-point operation isn't dependent on it. It's always performed regardless of whether progress updated is printed or not.
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0
end
else
count := count + 1;
sum := sum + value
end
As optimization the compiler might move the last summing statement into the end of both blocks of the if statement. This lets both blocks finish by jumping back to the start of the loop, saving a jump instruction. Otherwise one of the blocks has to end with a jump to the summing statement.
This transforms the code into this:
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0;
sum := sum + value
end
else
begin
count := count + 1;
sum := sum + value
end
end
This can result in the two summations being optimized differently. It may be in one code path the variable sum can be kept in a register, but in the other path its forced out in to memory. If x87 floating point instructions are used here this can cause sum to be rounded differently depending on a non-deterministic factor: whether or not its time to print the progress update.
Possible solutions
Whatever the source of your problem, clearing the FPU state isn't going to solve it. The fact that the 64-bit version works, provides an possible solution, using SSE math instead x87 math. I don't know if Delphi supports this, but it's common feature of C compilers. It's very hard and expensive to make x87 based floating-point math conforming to the C standard, so many C compilers support using SSE math instead.
Unfortunately, a quick search of the Internet suggests the Delphi compiler doesn't have option for using SSE floating-point math in 32-bit code. In that case your options would be more limited. You can try disabling optimization, that should prevent the compiler from creating differently optimized versions of the same code. You could also try to changing the rounding precision in the x87 floating-point control word. By default it uses 80-bit precision, but all your floating point variables are 64-bit then changing the FPU to use 64-bit precision should significantly reduce the effect optimization has on rounding.
To do the later you can probably use the Set8087CW procedure MBo mentioned, or maybe System.Math.SetPrecisionMode.
Hello I can calculate 17^1000 in calculator of windows 7 and it looks like
1.2121254521552524e+123 (which seems to me to be not correct)
how can I write it in delphi and I want to use for example 1.2121254521552524e+123 mod 18
or 1.2121254521552524e+123 mod 100.
Another example: 17 mod 5 = 2
How can I write it can anyone help me?
You would need to use some sort of extended precision type. Such a type would not be a primitive type, you would need to either use an existing one or write your own (which would be a huge amount of work, and a classic case of reinventing the wheel). Java might be a better language for this because its system libraries include BigInteger and BigDecimal classes, which handle the functionality you would need.
Edit: Here are some delphi libraries providing large integer and high precision floating point arithmetic: http://www.delphiforfun.org/programs/Library/big_integers.htm
That said, if you find yourself unable to use the windows calculator to accomplish what you are looking for and you only need this for one or two things, consider using a more powerful online service such as WolframAlpha.
Also, in case you still don't have an answer:
17^1000 mod 18 == 1
17^1000 mod 100 == 1
The algorithm used to compute numbers like these is simple and does not require large integer support. Consider the following pseudocode:
modular_exponentiation (base, exponent, modulus):
let value = 1
let c_exponent = 0
for c_exponent less than exponent:
let value = value * base
if value greater than or equal to modulus:
let value = (value) mod (modulus)
increment c_exponent
value.
Good morning all,
I'm having some issues with floating point math, and have gotten totally lost in ".to_f"'s, "*100"'s and ".0"'s!
I was hoping someone could help me with my specific problem, and also explain exactly why their solution works so that I understand this for next time.
My program needs to do two things:
Sum a list of decimals, determine if they sum to exactly 1.0
Determine a difference between 1.0 and a sum of numbers - set the value of a variable to the exact difference to make the sum equal 1.0.
For example:
[0.28, 0.55, 0.17] -> should sum to 1.0, however I keep getting 1.xxxxxx. I am implementing the sum in the following fashion:
sum = array.inject(0.0){|sum,x| sum+ (x*100)} / 100
The reason I need this functionality is that I'm reading in a set of decimals that come from excel. They are not 100% precise (they are lacking some decimal points) so the sum usually comes out of 0.999999xxxxx or 1.000xxxxx. For example, I will get values like the following:
0.568887955,0.070564759,0.360547286
To fix this, I am ok taking the sum of the first n-1 numbers, and then changing the final number slightly so that all of the numbers together sum to 1.0 (must meet validation using the equation above, or whatever I end up with). I'm currently implementing this as follows:
sum = 0.0
array.each do |item|
sum += item * 100.0
end
array[i] = (100 - sum.round)/100.0
I know I could do this with inject, but was trying to play with it to see what works. I think this is generally working (from inspecting the output), but it doesn't always meet the validation sum above. So if need be I can adjust this one as well. Note that I only need two decimal precision in these numbers - i.e. 0.56 not 0.5623225. I can either round them down at time of presentation, or during this calculation... It doesn't matter to me.
Thank you VERY MUCH for your help!
If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.
It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).
When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.
require "bigdecimal"
BigDecimal("0.9987")
That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.
If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.
1 - array.map{|v| BigDecimal(v)}.reduce(:+)
Either:
continue using floats and round(2) your totals: 12.341.round(2) # => 12.34
use integers (i.e. cents instead of dollars)
use BigDecimal and you won't need to round after summing them, as long as you start with BigDecimal with only two decimals.
I think that algorithms have a great deal more to do with accuracy and precision than a choice of IEEE floating point over another representation.
People used to do some fine calculations while still dealing with accuracy and precision issues. They'd do it by managing the algorithms they'd use and understanding how to represent functions more deeply. I think that you might be making a mistake by throwing aside that better understanding and assuming that another representation is the solution.
For example, no polynomial representation of a function will deal with an asymptote or singularity properly.
Don't discard floating point so quickly. I could be that being smarter about the way you use them will do just fine.
I can write for..do process for integer value..
But I can't write it for int64 value.
For example:
var
i:int64;
begin
for i:=1 to 1000 do
end;
The compiler refuses to compile this, why does it refuse?
The Delphi compiler simply does not support Int64 loop counters yet.
Loop counters in a for loop have to be integers (or smaller).
This is an optimization to speed up the execution of a for loop.
Internally Delphi always uses an Int32, because on x86 this is the fastest datatype available.
This is documented somewhere deep in the manual, but I don't have a link handy right now.
If you must have a 64 bit loop counter, use a while..do or repeat..until loop.
Even if the compiler did allow "int64" in a Delphi 7 for-loop (Delphi 7???), it probably wouldn't complete iterating through the full range until sometime after the heat death of the Sun.
So why can't you just use an "integer"?
If you must use an int64 value ... then simply use a "while" loop instead.
Problem solved :)
Why to use a Int64 on a for-loop?
Easy to answer:
There is no need to do a lot of iterations to need a Int64, just do a loop from 5E9 to 5E9+2 (three iterations in total).
It is just that values on iteration are bigger than what Int32 can hold
An example:
procedure Why_Int64_Would_Be_Great_On_For_Loop;
const
StartValue=5000000000; // Start form 5E9, 5 thousand millons
Quantity=10; // Do it ten times
var
Index:Int64;
begin
for Index:=StartValue to StartValue+Quantity-1
do begin // Bla bla bla
// Do something really fast (only ten times)
end;
end;
That code would take no time at all, it is just that index value need to be far than 32bit integer limit.
The solution is to do it with a while loop:
procedure Equivalent_For_Loop_With_Int64_Index;
const
StartValue=5000000000; // Start form 5E9, 5 thousand millons
Quantity=10; // Do it ten times
var
Index:Int64;
begin
Index:=StartValue;
while Index<=StartValue+Quantity
do begin // Bla bla bla
// Do something really fast (only ten times)
Inc(Index);
end;
end;
So why the compiler refuses to compile the foor loop, i see no real reason... any for loop can be auto-translated into a while loop... and pre-compiler could do such before compiler (like other optimizations that are done)... the only reason i see is the lazy people that creates the compiler that did not think on it.
If for is optimized and so it is only able to use 32 bit index, then if code try to use a 64 bit index it can not be so optimized, so why not let pre-compiler optimizator to chage that for us... it only gives bad image to programmers!!!
I do not want to make anyone ungry...
I only just say something obvious...
By the way, not all people start a foor loop on zero (or one) values... sometimes there is the need to start it on really huge values.
It is allways said, that if you need to do something a fixed number of times you best use for loop instead of while loop...
Also i can say something... such two versions, the for-loop and the while-loop that uses Inc(Index) are equally fast... but if you put the while-loop step as Index:=Index+1; it is slower; it is really not slower because pre-compiler optimizator see that and use Inc(Index) instead... you can see if buy doing the next:
// I will start the loop from zero, not from two, but i first do some maths to avoid pre-compiler optimizator to convert Index:=Index+Step; to Inc(Index,Step); or better optimization convert it to Inc(Index);
Index:=2;
Step:=Index-1; // Do not put Step:=1; or optimizator will do the convertion to Inc()
Index:=Step-2; // Now fix, the start, so loop will start from zero
while Index<1000000 // 1E6, one millon iterations, from 0 to 999999
do begin
// Do something
Index:=Index+Step; // Optimizator will not change this into Inc(Index), since sees that Step has changed it's value before
end;
The optimizer can see a variable do not change its value, so it can convert it to a constant, then on the increment assign if adding a constant (variable:=variable+constant) it will optimize it to Inc(variable,constant) and in the case it sees such constant is 1 it will also optimes it to Inc(variable)... and such optimizatons in low level computer language are very noticeble...
In Low level computer language:
A normal add (variable:=variable1+variable2) implies two memory reads plus one sum plus one memory write... lot of work
But if is a (variable:=variable+othervariable) it can be optimized holding variable inside the processor cache.
Also if it is a (variable:=variable1+constant) it can also be optimized by holding constant on the processor cache
And if it is (variable:=variable+constant) both are cached on processor cache, so huge fast compared with other options, no acces to RAM is needed.
In such way pre-compiler optimizer do another important optimization... for-loops index variables are holded as processor registers... much more faster than processor cache...
Most mother processor do an extra optimization as well (at hardware level, inside the processor)... some cache areas (32 bit variables for us) seen that are intensivly used are stored as special registers to fasten access... and such for-loop / while-loop indexes are ones of them... but as i said.. most mother AMD proccesors (the ones that uses MP technology does that)... i do not yet know any Intel that do that!!! such optimization is more relevant when multi-core and on super-computing... so maybe that is the reason why AMD has it and Intel not!!!
I only want to show one "why", there are a lot more... another one could be as simple as the index is stored on a database Int64 field type, etc... there are a lot of reasons i know and a lot more i did not know yet...
I hope this will help to understand the need to do a loop on a Int64 index and also how to do it without loosing speed by correctly eficiently converting loop into a while loop.
Note: For x86 compiles (not for 64bit compilation) beware that Int64 is managed internally as two Int32 parts... and when modifing values there is an extra code to do, on adds and subs it is very low, but on multiplies or divisions such extra is noticeble... but if you really need Int64 you need it, so what else to do... and imagine if you need float or double, etc...!!!