Float comparison (equality) in CoreGraphics - comparison

Apple CoreGraphics.framework, CGGeometry.h:
CG_INLINE bool __CGSizeEqualToSize(CGSize size1, CGSize size2)
{
return size1.width == size2.width && size1.height == size2.height;
}
#define CGSizeEqualToSize __CGSizeEqualToSize
Why do they (Apple) compare floats with ==? I can't believe this is a mistake. So can you explain me?
(I've expected something like fabs(size1.width - size2.width) < 0.001).

Floating point comparisons are native width on all OSX and iOS architectures.
For float, that comes to:
i386, x86_64:
32 bit XMM register (or memory for second operand)
using an instructions in the family of ucomiss
ARM:
32 bit register
using instructions in the same family as vcmp
Some of the floating point comparison issues have been removed by restricting storage to 32/64 for these types. Other platforms may use the native 80 bit FPU often (example). On OS X, SSE instructions are favored, and they use natural widths. So, that reduces many of the floating point comparison issues.
But there is still room for error, or times when you will favor approximation. One hidden detail about CGGeometry types' values is that they may be rounded to a nearby integer (you may want to do this yourself in some cases).
Given the range of CGFloat (float or double-x86_64) and typical values, it's reasonable to assume the rounded values generally be represented accurately enough, such that the results will be suitably comparable in the majority of cases. Therefore, it's "pretty safe", "pretty accurate", and "pretty fast" within those confines.
There are still times when you may prefer approximated comparisons in geometry calculations, but apple's implementation is what I'd consider the closest to a reference implementation for the general purpose solution in this context.

Related

Is it possible to clear the FPU?

I'm using Delphi XE6 to perform a complicated floating point calculation. I realize the limitations of floating point numbers so understand the inaccuracies inherent in FP numbers. However this particular case, I always get 1 of 2 different values at the end of the calculation.
The first value and after a while (I haven't figured out why and when), it flips to the second value, and then I can't get the first value again unless I restart my application. I can't really be more specific as the calculation is very complicated. I could almost understand if the value was somewhat random, but just 2 different states is a little confusing. This only happens in the 32-bit compiler, the 64 bit compiler gives one single answer no matter how many times I try it. This number is different from the 2 from the 32-bit calculation, but I understand why that is happening and I'm fine with it. I need consistency, not total accuracy.
My one suspicion is that perhaps the FPU is being left in a state after some calculation that affects subsequent calculations, hence my question about clearing all registers and FPU stack to level out the playing field. I'd call this CLEARFPU before I start of the calculation.
After some more investigation I realized I was looking in the wrong place. What you see is not what you get with floating point numbers. I was looking at the string representation of the numbers and thinking here are 4 numbers going into a calculation ALL EQUAL and the result is different. Turns out the numbers only seemed to be the same. I started logging the hex equivalent of the numbers, worked my way back and found an external dll used for matrix multiplication the cause of the error. I replaced the matrix multiplication with a routine written in Delphi and all is well.
Floating point calculations are deterministic. The inputs are the input data and the floating point control word. With the same input, the same calculation will yield repeatable output.
If you have unpredictable results, then there will be a reason for it. Either the input data or the floating point control word is varying. You have to diagnose what that reason for that is. Until you understand the problem fully, you should not be looking for a problem. Do not attempt to apply a sticking plaster without understanding the disease.
So the next step is to isolate and reproduce the problem in a simple piece of code. Once you can reproduce the issue you can solve the problem.
Possible explanations include using uninitialized data, or external code modifying the floating point control word. But there could be other reasons.
Uninitialized data is plausible. Perhaps more likely is that some external code is modifying the floating point control word. Instrument your code to log the floating point control word at various stages of execution, to see if it ever changes unexpectedly.
You've probably been bitten by combination of optimization and excess x87 FPU precision resulting in the same bit of floating-point code in your source code being duplicated with different assembly code implementations with different rounding behaviour.
The problem with x87 FPU math
The basic problem is that while x87 FPU the supports 32-bit, 64-bit and 80-bit floating-point value, it only has 80-bit registers and the precision of operations is determined by the state of the bits in the floating point control word, not the instruction used. Changing the rounding bits is expensive, so most compilers don't, and so all floating point operations end being be performed at the same precision regardless of the data types involved.
So if the compiler sets the FPU to use 80-bit rounding and you add three 64-bit floating point variables, the code generated will often add the first two variables keeping the unrounded result in a 80-bit FPU register. It would then add the third 64-bit variable to 80-bit value in the register resulting in another unrounded 80-bit value in a FPU register. This can result in a different value being calculated than if the result was rounded to 64-bit precision after each step.
If that resulting value is then stored in a 64-bit floating-point variable then the compiler might write it to memory, rounding it to 64 bits at this point. But if the value is used in later floating point calculations then the compiler might keep it in a register instead. This means what rounding occurs at this point depends on the optimizations the compiler performs. The more its able to keep values in a 80-bit FPU register for speed, the more the result will differ from what you'd get if all floating point operation were rounded according to the size of actual floating point types used in the code.
Why SSE floating-point math is better
With 64-bit code the x87 FPU isn't normally used, instead equivalent scalar SSE instructions are used. With these instructions the precision of the operation used is determined by the instruction used. So with the adding three numbers example, the compiler would emit instructions that added the numbers using 64-bit precision. It doesn't matter if the result gets stored in memory or stays in register, the value remains the same, so optimization doesn't affect the result.
How optimization can turn deterministic FP code into non-deterministic FP code
So far this would explain why you'd get a different result with 32-bit code and 64-bit code, but it doesn't explain why you can get a different result with the same 32-bit code. The problem here is that optimizations can change the your code in surprising ways. One thing the compiler can do is duplicate code for various reasons, and this can cause the same floating point code being executed in different code paths with different optimizations applied.
Since optimization can affect floating point results this can mean the different code paths can give different results even though there's only one code path in the source code. If the code path chosen at run time is non-deterministic then this can cause non-deterministic results even when the in the source code the result isn't dependent on any non-deterministic factor.
An example
So for example, consider this loop. It performs a long running calculation, so every few seconds it prints a message letting the user know how many iterations have been completed so far. At the end of the loop there's simple summation performed using floating-point arithmetic. While there's non-deterministic factor in the loop, the floating-point operation isn't dependent on it. It's always performed regardless of whether progress updated is printed or not.
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0
end
else
count := count + 1;
sum := sum + value
end
As optimization the compiler might move the last summing statement into the end of both blocks of the if statement. This lets both blocks finish by jumping back to the start of the loop, saving a jump instruction. Otherwise one of the blocks has to end with a jump to the summing statement.
This transforms the code into this:
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0;
sum := sum + value
end
else
begin
count := count + 1;
sum := sum + value
end
end
This can result in the two summations being optimized differently. It may be in one code path the variable sum can be kept in a register, but in the other path its forced out in to memory. If x87 floating point instructions are used here this can cause sum to be rounded differently depending on a non-deterministic factor: whether or not its time to print the progress update.
Possible solutions
Whatever the source of your problem, clearing the FPU state isn't going to solve it. The fact that the 64-bit version works, provides an possible solution, using SSE math instead x87 math. I don't know if Delphi supports this, but it's common feature of C compilers. It's very hard and expensive to make x87 based floating-point math conforming to the C standard, so many C compilers support using SSE math instead.
Unfortunately, a quick search of the Internet suggests the Delphi compiler doesn't have option for using SSE floating-point math in 32-bit code. In that case your options would be more limited. You can try disabling optimization, that should prevent the compiler from creating differently optimized versions of the same code. You could also try to changing the rounding precision in the x87 floating-point control word. By default it uses 80-bit precision, but all your floating point variables are 64-bit then changing the FPU to use 64-bit precision should significantly reduce the effect optimization has on rounding.
To do the later you can probably use the Set8087CW procedure MBo mentioned, or maybe System.Math.SetPrecisionMode.

getShaderPrecisionFormat return values

I'm confused about the method getShaderPrecisionFormat, what it's used for, and what it's telling me because for me it always returns the exact same precision for all arguments, only differences are between INT / FLOAT.
to be clear:
calls with gl.FRAGMENT_SHADER and gl.VERTEX_SHADER in combinations with gl.LOW_FLOAT, gl.MEDIUM_FLOAT and gl.HIGH_FLOAT always return
WebGLShaderPrecisionFormat { precision: 23, rangeMax: 127, rangeMin: 127 }
calls with gl.FRAGMENT_SHADER and gl.VERTEX_SHADER in combinations with gl.LOW_INT, gl.MEDIUM_INT and gl.HIGH_INT always return
WebGLShaderPrecisionFormat { precision: 0, rangeMax: 24, rangeMin: 24 }
I experimented with also supplying two additional arguments "range" and "precision" but was unable to get any different results. I assume I made a mistake but from the docs I'm unable to figure out on my own how to use it correctly.
It looks like you're using these calls correctly.
If you're running on a desktop/laptop, the result is not surprising. I would expect WebGL to be layered on top of a full OpenGL implementation on such systems. Even if these systems support ES 2.0, which mostly matches the WebGL feature level, that's most likely just a reduced API that ends up using the same underlying driver/GPU features as the full OpenGL implementation.
Full OpenGL does not really support precisions. It does have the keywords in GLSL, but that's just for source code compatibility with OpenGL ES. In the words of the GLSL 4.50 spec:
Precision qualifiers are added for code portability with OpenGL ES, not for functionality. They have the same syntax as in OpenGL ES, as described below, but they have no semantic meaning, which includes no effect on the precision used to store or operate on variables.
It then goes on to define the use of IEEE 32-bit floats, which have the 23 bits of precision you are seeing from your calls.
You would most likely get a different result if you try the same thing on a mobile device, like a phone or tablet. Many mobile GPUs support 16-bit floats (aka "half floats"), and take advantage of them. Some of them can operate on half floats faster than they can on floats, and the reduced memory usage and bandwidth is beneficial even if the operations themselves are not faster. Reducing memory/bandwidth usage is critical on mobile devices to improve performance, as well as power efficiency.

32 bit multiplication on 24 bit ALU

I want to port a 32 by 32 bit unsigned multiplication on a 24-bit dsp (it's a Linear Congruential Generator, so I'm not allowed to truncate, also I don't want to replace yet the current LCG with a 24 bit one). The available data types are 24 and 48 bit ints.
Only the last 32 LSB are needed. Do you know any hacks to implement this in fewer multiplies, masks and shifts than the usual way?
The line looks like this:
//val is an int(32 bit)
val = (1664525 * val) + 1013904223;
An outline would be (in my current compiler style):
static uint48_t val = SEED;
...
val = 0xFFFFFFFFUL & ((1664525UL * val) + 1013904223UL);
and hopefully the compiler will recognise:
it can use a multiply and accumulate command
it only needs a reduced multiply algorithim due to the "high word" of the constant being zero
the AND could be effected by resetting the upper bits or multiplying a constant and restoring
...other stuff depends on your {mystery dsp} target
Note
if you scale up the coefficients by 2^16, you can get truncation for free, but due to lack of info
you will have to explore/decide if it is better overall.
(This is more an elaboration why two multiplications 24×24→n, 31<n are enough for 32×32→min(n, 40).)
The question discloses amazingly little about the capabilities to build a method
32×21→32 in fewer [24×24] multiplies, masks and shifts than the usual way on:
24 and 48 bit ints & DSP (I read high throughput, non-high latency 24×24→48).
As far as there indeed is a 24×24→48 multiply (or even 24×24+56→56 MAC) and one factor is less than 24 bits, the question is pointless, a second multiply being the compelling solution.
The usual composition of a 24<n<48×24<m<48→24<p multiply from 24×24→48 uses three of the latter; a compiler should know as well as a coder that "the fourth multiply" would yield bits with a significance/position exceeding the combined lengths of the lower parts of the factors.
So, is it possible to generate "the long product" using just a second 24×24→48?
Let the (bytes of the) factors be w_xyz and W_XYZ, respectively; the underscores suggesting "the Ws" being the lower significance bits in the higher significance words/ints if interpreted as 24bit ints. The first 24×24→48 gives the sum of
  zX
 yXzY
xXyYzZ
 xYyZ
  xZ, what is needed (fat) is
 wZ +
 zW.
This can be computed using one combined multiplication of
((w<<16)|(z & 0xff)) × ((W<<16)|(Z & 0xff)). (Never mind the 17th bit of wZ+zW "running" into wW.)
(In the first revision of this answer, I foolishly produced wZ and zW separately - their sum is wanted in the end, anyway.)
(Annoyingly, this is about all you can do for 24×24→24 as a base operation too - beyond this "combining multiplication", you need four instead of one.)
Another angle to explore is choosing a different PRNG.
It may have to be >24 bits (tell!).
On a 24 bit machine, XorShift* (or even XorShift+) 48/32 seems worth a look.

Replacement for `fabs`, `fmax`, etc. for use with CGFloat on 64-bit iOS devices

Question
Consider layout code like this:
CGFloat descriptionHeight = // height of a paragraph of text with zero or more words;
CGFloat imageHeight = // height of an image;
CGFloat yCoordinateOfSomeOtherView = fmax(descriptionHeight, imageHeight) + 5.0f;
How should the third line be rewritten with support for 64-bit iOS devices?
(The current code doesn't take into account whether yCoordinateOfSomeOtherView is a float or a double.)
Options
A few options (I'm not sure which is best):
1. Define my own macro
#if defined(__LP64__) && __LP64__
#define cgfmax(x,y) fmaxf(x,y)
#else
#define cgfmax(x,y) fmax(x,y)
#endif
I could then replace fmax with cgfmax.
2. Use tgmath.h
This Stack Overflow answer from 2011 suggests replacing math.h with tgmath.h. This replaces the implementation of fmax with one that calls __typeof__ on each argument and returns either fmax or fmaxf depending on the argument types.
3. Ignore this issue
Since CGFloats relate to layout, the data loss potentially incurred storing a float into a double will usually be insignificant. That is, they'll represent tiny fractions of pixels.
Summary
I'm looking for any other options or helpful advice from someone who's done a 32-bit to 64-bit transition before.
There is never any "data loss" incurred when converting a float into a double. That conversion is always exact.
Your solutions 1 & 2 are entirely equivalent semantically, though (2) is more stylistically correct.
Your solution 3 is formally not equivalent; it may incur extra conversions between float and double, which may make it slightly less efficient (but the actual result is identical).
Basically, it doesn't matter, but use tgmath.

CGFloat-based math functions?

In both OS X and iOS, Apple is using the CGFloat typedef to automatically get a float on 32-bit systems and a double on 64-bit systems. But when using the normal Unix math functions you need to make that decision on a case by case basis.
For example the floor() function is defined as:
double floor(double x);
and the floorf() function is defines as:
float floorf(float x);
I know all iOS devices are 32-bit today, but the reason to use CGFloat is to automatically get improved precision when the first 64-bit iOS devices are introduced (iPad 5?) and not having to change any code. But to make that possible we need CGFloat-based math functions:
CGFloat Floor(CGFloat x);
Are there any functions like that in iOS or any third-party libraries? Or how do other developers handle this? I suspect most are either using CGFloat together with the float-versions on iOS but then you would "need" to change every line with a math function if compiling for a 64-bit iOS device (and handle two different versions of of your code base).
Or you could just use float instead of CGFloat everywhere and not worrying about 64-bit iOS. But then you would get inconsistency with Apple's libraries and IMHO uglier code.
Or you could maybe use CGFloat together with the double-versions and just take the space and performance hit of letting the compiler convert between double and float all the time. And not caring about possible "Implicit conversion to 32 Bit Type" warnings.
Or maybe the best strategy, bet on no 64-bit version of iOS will ever arrive (or at least in a near enough future) and use CGFloat together with float-versions of the Unix math functions and not worrying about the future.
What strategy are you using?
Edit: So this question is no longer theoretical now that we have 64-bit iOS!
I think the easiest way to deal with this is to use tgmath.h, instead of math.h (math.h gets imported by Cocoa automatically, tgmath doesn't so you'll need to declare it somewhere).
There are a number of potential solutions discussed at the Cocoa mailing list thread:
http://www.cocoabuilder.com/archive/cocoa/291802-math-functions-with-cgfloat.html
I would agree with the majority of people on that list that tgmath is probably the way to go. I believe it was originally intended to allow easy porting of Fortran code, but it also works in this scenario. TGMath will let you do:
double floor(double x);
float floor(float x);
long double floor(long double x);

Resources