How safe is comparing numbers in lua with equality operator? - lua

In my engine I have a Lua VM for scripting. In the scripts, I write things like:
stage = stage + 1
if (stage == 5) then ... end
and
objnum = tonumber("5")
if (stage == objnum)
According to the Lua sources, Lua uses a simple equality operator when comparing doubles, the internal number type it uses.
I am aware of precision problems when dealing with floating point values, so I want to know if the comparison is safe, that is, will there be any problems with simply comparing these numbers using Lua's default '==' operation? If so, are there any countermeasures I can employ to make sure 1+2 always compares as equal to 3? Will converting the values to strings work?

You may be better off by converting to string and then comparing the results if you only care about equality in some cases. For example:
> print(21, 0.07*300, 21 == 0.07*300, tostring(21) == tostring(0.07*300))
21 21 false true
I learned this hard way when I gave my students an assignment with these numbers (0.07 and 300) and asked them to implement a unit test that then miserably failed complaining that 21 is not equal 21 (it was comparing actual numbers, but displaying stringified values). It was a good reason for us to have a discussion about comparing floating point values.

I can employ to make sure 1+2 always compares as equal to 3?
You needn't worry. The number type in Lua is double, which can hold many more integers exactly than a long int.

Comparison and basic operations on doubles is safe in certain situations. In particular if the numbers and their result can be expressed exactly - including all low value integers.
So 2+1 == 3 will be fine for doubles.
NOTE: I believe there's even some guarantees for certain math functions ( like pow and sqrt ) and if your compiler/library respects those then sqrt(4.0)==2.0 or 4.0 == pow(2.0,2.0) will reliably be true.

By default, Lua is compiled with c++ floats, and behind the scenes number comparisons boils down to float comparisons in c/c++, which are indeed problematic and discussed in several threads, e.g. most-effective-way-for-float-and-double-comparison.
Lua makes the situation only slightly worse by converting all numbers, including c++ integers, into floats. So you need to keep it in mind.

Related

Dart int and double being interned? Treated specially by identical()?

Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?
Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.
Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.
I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.

Lua floating point operations

I run Lua on a CPU without dedicated floating point HW, depending on SW emulation.
From luaopt.h I can see that some macros are set to double, but it does not clearly state when floats are used and its a little hard to track it.
If my script does simple stuff like:
a=0
a=a+1
for...
Would that involve a floating point operations at any level?
If no that's fine, but what is then the benefit to change macros to long?
(I tried of course but did not work....)
All numeric operations in Lua are performed (according to the default configuration) in floating point. There is no distinction made between floating point and integer, all values are simply numbers.
The actual C type used to store a Lua number is set in luaconf.h, and it is both allowed and even practical to change that to a suitable integral type. You start by changing LUA_NUMBER from double to int, long, or perhaps ptrdiff_t. Then you will find you need to tweak the related macros that control the conversions between strings and numbers. And, of course, you will likely need to eliminate most or all of the base math library since math.sin() and its friends and neighbors are not particularly useful over integers.
The result will be a Lua interpreter where all numbers are integers. The language will still allow you to type 3.14, but it will be stored as 3. Your code will likely not be completely portable to a Lua interpreter built with the standard configuration since a huge amount of Lua code casually assumes that floating point arithmetic is permitted, and remember that your compiled byte code will definitely not be compatible since byte code will store numbers as LUA_NUMBER.
There is LNUM patch (used, for example, by OpenWrt project which relies heavily on Lua for providing Web UI on hardware without FPU) that allows dual integer/floating point representation of numbers in Lua with conversions happening behind the scenes when required. With it most integer computations will be performed without resorting to FPU. Unfortunately, it's only applicable to Lua 5.1; 5.2 is not supported.

What is the difference between Delphi string comparsion functions?

There's a bunch of ways you can compare strings in modern Delphi (say 2010-XE3):
'<=' operator which resolves to UStrCmp / LStrCmp
CompareStr
AnsiCompareStr
Can someone give (or point to) a description of what those methods do, in principle?
So far I've figured that AnsiCompareStr calls CompareString on Windows, which is a "textual" comparison (i.e. takes into account unicode combined characters etc). Simple CompareStr does not do that and seems to do a binary comparison instead.
But what is the difference between CompareStr and UStrCmp? Between UStrCmp and LStrCmp? Do they all produce identical results? Do those results change between versions of Delphi?
I'm asking because I need a comparison which will always produce the same results, so that indexes in app built with one version of Delphi remain consistent with code built with another.
AnsiCompareStr is specified as taking locale into account, and should return identical results regardless of Delphi version, but may return different results based on Windows version and/or settings.. CompareStr is a pure binary comparison: "The comparison operation is based on the 16-bit ordinal value of each character and is not affected by the current locale" (for the CompareStr(const S1, S2: string) overload). UStrCmp also uses a pure binary comparison: "Strings are compared according to the ordinal values that make up the characters that make up the string." So there should not be a difference between the latter two. The way they return the result is different, so two implementations are needed (although it would be possible to make one rely on the other).
As for the differences between LStrCmp and UStrCmp, LStrCmp takes AnsiStrings, UStrCmp takes UnicodeStrings. It's entirely possible that two characters (let's say A and B) are ordered in the misnamed "ANSI" code page as A < B, but are ordered in Unicode as A > B. You should almost always just use the comparison appropriate for the data you have.

Multiplying LongInts. Expression of the form: Int64Var := LongIntVar * LongIntVar

I always thought it was part of the design philosophy in Pascal, that it looked at both the right and left hand sides of an expression when deciding what format/precision to use for an operation. So that, unlike C where an expression like,
Float_Var = 1/3
results in a value of 0.0 for Float_Var, Pascal always gets this stuff right. :)
So I was kind of surprised when I went to multiply two LongInts (32 bit) to give an Int64 result and found I was getting anomalous results. I had to get all C like and use,
Int64_Var := Int64(LongIntVar1) * LongIntVar2
to make it work correctly. (BTW. This was under Delphi, various versions tested, but all win32).
I was just wondering if this is an exceptional case in Delphi/Pascal? Or are there other examples where the usual Pascal way, using the types on both sides of an expression to decide on how the operation is performed, doesn't hold.
If by "both sides" you mean that it looks at the type of the target variable in an assignment for determining the expression type, then no, that has never been the case. Delphi works like any other mainstream compiler in that regard - that is, the type of an expression is determined from the inside out.
I always thought it was part of the design philosophy in
Pascal, that it looked at both the right and left hand sides of
an expression when deciding what format/precision to use
for an operation.
That is not correct. Expressions assignment targets do not influence the evaluation of the expression.
The reason that
Float_Var = 1/3;
evaluates to 0 in C/C++ is that the / operator is overloaded. It can mean either integer division or floating point division. If one of the arguments is floating point then the operator is floating point division, otherwise, as here, it is integer division.
In Delphi the / operator is not overloaded. It is always floating point division. That's why this code gives a compile error:
Int_Var := 1/3;

AnsiStrIComp fails comparing strings in Delphi 2010

I'm slightly confused and hoping for enlightenment.
I'm using Delphi 2010 for this project and I'm trying to compare 2 strings.
Using the code below fails
if AnsiStrIComp(PAnsiChar(sCatName), PAnsiChar(CatNode.CatName)) = 0 then...
because according to the debugger only the first character of each string is being compared (i.e. if sCatName is "Automobiles", PAnsiChar(sCatName) is "A").
I want to be able to compare strings that may be in different languages, for example English vs Japanese.
In this case I am looking for a match, but I have other functions used for sorting, etc. where I need to know how the strings compare (less than, equal, greater than).
I assume that sCatName and CatNode.CatName are defined as strings (= UnicodeStrings)?. They should be.
There is no need to convert the strings to null-terminated strings! This you (mostly) only need to do when working with the Windows API.
If you want to test equality of two strings, use SameStr(S1, S2) (case sensitive matching) or SameText(S1, S2) (case insensitive matching), or simply S1 = S2 in the first case. All three options return true or false, depending on the strings equality.
If you want to get a numerical value based on the ordinal values of the characters (as in sorting), then use CompareStr(S1, S2) or CompareText(S1, S2). These return a negative integer, zero, or a positive integer.
(You might want to use the Ansi- functions: AnsiSameStr, AnsiSameText, AnsiCompareStr, and AnsiCompareText; these functions will use the current locale. The non Ansi- functions will accept a third, optional parameter, explicitly specifying the locale to use.)
Update
Please read Remy Lebeau's comments regarding the cause of the problem.
What about simple sCatName=CatNode.CatName? If they are strings it should work.

Resources