Dart int and double being interned? Treated specially by identical()? - dart

Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?

Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.

Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.

I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.

Related

Equal objects with different identity?

Is it possible in Rascal to create clones of an object with different identity so that they are equal but not identical?
No, Rascal has value semantics.
data X = x();
bool alwaysTrue = x() == x();
Even using closures, functions as data, you can not construct two distinguishable instances a and b which will still return true on a == b. The reason is that closures are never considered equal unless you have an alias pointing to the same instance.
There is also no clone operation or anything like that. There are just expressions and their result is isomorphic to the expression tree that created them.
Semantically Rascal does not guarantee that all values on the heap are actually shared or that they are just indistinguishable, so the memory optimisation perspective is left entirely to the run-time implementation.

How to compare Rails ''executables" before and after refactor?

In C, I could generate an executable, do an extensive rename only refactor, then compare executables again to confirm that the executable did not change. This was very handy to ensure that the refactor did not break anything.
Has anyone done anything similar with Ruby, particularly a Rails app? Strategies and methods would be appreciated. Ideally, I could run a script that output a single file of some sort that was purely bytecode and was not changed by naming changes. I'm guessing JRuby or Rubinus would be helpful here.
I don't think this strategy will work for Ruby. Unlike C, where the compiler throws away the names, most of the things you name in Ruby carry that name with them. That includes classes, modules, constants, and instance variables.
Automated unit and integration tests are the way to go to support Ruby refactoring.
Interesting question -- I like the definitive "yes" answer you can get from this regression strategy, at least for the specific case of rename refactoring.
I'm not expert enough to tell whether you can compile ruby (or at least a subset, without things like eval) but there seem to be some hints at:
http://www.hokstad.com/the-problem-with-compiling-ruby.html
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Supposing that a complete compilation isn't possible, what about an abstract interpretation approach? You could parse the ruby into an AST, emit some kind of C code from the AST, and then compile the C code. The C code would not need to fully capture the behavior of the ruby code. It would only need to be compilable and to be distinct whenever the ruby was distinct. (Actually running it could result in gibberish, or perhaps an immediate memory violation error.)
As a simple example, suppose that ruby supported multiplication and C didn't. Then you could include a static mult function in your C code and translate from:
a = b + c*d
to
a = b + mult(c,d)
and the resulting compiled code would be invariant under name refactoring but would show discrepancies under other sorts of change. The mult function need not actually implement multiplication, you could have one of these instead:
static int mult( int a, int b ) { return a + b; } // pretty close
static int mult( int a, int b ) { return *0; } // not close at all, but still sufficient
and you'd still get the invariance you need as long as the C compiler isn't going to inline the definition. The same sort of translation, from an uncompilable ruby construct to a less functional but distinct C construct, should work for object manipulation and so forth, mapping class operations into C structure references. The key point is just that you want to keep the naming relationships intact while sacrificing actual behavior.
(I wonder whether you could do something with a single C struct that has members (all pointers to the same struct type) named after all the class and property names in the ruby code. Class and object operations would then correspond to nested dereference operations using this single structure. Just a notion.)
Even if you cannot formulate a precise mapping, an imprecise mapping that misses some minor distinctions might still be enough to increase confidence in the original name refactoring.
The quickest way to implement such a scheme might be to map from byte code to C (rather from the ruby AST to C). That would save a lot of parsing, but the mapping would be harder to understand and verify.

How safe is comparing numbers in lua with equality operator?

In my engine I have a Lua VM for scripting. In the scripts, I write things like:
stage = stage + 1
if (stage == 5) then ... end
and
objnum = tonumber("5")
if (stage == objnum)
According to the Lua sources, Lua uses a simple equality operator when comparing doubles, the internal number type it uses.
I am aware of precision problems when dealing with floating point values, so I want to know if the comparison is safe, that is, will there be any problems with simply comparing these numbers using Lua's default '==' operation? If so, are there any countermeasures I can employ to make sure 1+2 always compares as equal to 3? Will converting the values to strings work?
You may be better off by converting to string and then comparing the results if you only care about equality in some cases. For example:
> print(21, 0.07*300, 21 == 0.07*300, tostring(21) == tostring(0.07*300))
21 21 false true
I learned this hard way when I gave my students an assignment with these numbers (0.07 and 300) and asked them to implement a unit test that then miserably failed complaining that 21 is not equal 21 (it was comparing actual numbers, but displaying stringified values). It was a good reason for us to have a discussion about comparing floating point values.
I can employ to make sure 1+2 always compares as equal to 3?
You needn't worry. The number type in Lua is double, which can hold many more integers exactly than a long int.
Comparison and basic operations on doubles is safe in certain situations. In particular if the numbers and their result can be expressed exactly - including all low value integers.
So 2+1 == 3 will be fine for doubles.
NOTE: I believe there's even some guarantees for certain math functions ( like pow and sqrt ) and if your compiler/library respects those then sqrt(4.0)==2.0 or 4.0 == pow(2.0,2.0) will reliably be true.
By default, Lua is compiled with c++ floats, and behind the scenes number comparisons boils down to float comparisons in c/c++, which are indeed problematic and discussed in several threads, e.g. most-effective-way-for-float-and-double-comparison.
Lua makes the situation only slightly worse by converting all numbers, including c++ integers, into floats. So you need to keep it in mind.

id values of different variables in python 3

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

Explaining pattern matching vs switch

I have been trying to explain the difference between switch statements and pattern matching(F#) to a couple of people but I haven't really been able to explain it well..most of the time they just look at me and say "so why don't you just use if..then..else".
How would you explain it to them?
EDIT! Thanks everyone for the great answers, I really wish I could mark multiple right answers.
Having formerly been one of "those people", I don't know that there's a succinct way to sum up why pattern-matching is such tasty goodness. It's experiential.
Back when I had just glanced at pattern-matching and thought it was a glorified switch statement, I think that I didn't have experience programming with algebraic data types (tuples and discriminated unions) and didn't quite see that pattern matching was both a control construct and a binding construct. Now that I've been programming with F#, I finally "get it". Pattern-matching's coolness is due to a confluence of features found in functional programming languages, and so it's non-trivial for the outsider-looking-in to appreciate.
I tried to sum up one aspect of why pattern-matching is useful in the second of a short two-part blog series on language and API design; check out part one and part two.
Patterns give you a small language to describe the structure of the values you want to match. The structure can be arbitrarily deep and you can bind variables to parts of the structured value.
This allows you to write things extremely succinctly. You can illustrate this with a small example, such as a derivative function for a simple type of mathematical expressions:
type expr =
| Int of int
| Var of string
| Add of expr * expr
| Mul of expr * expr;;
let rec d(f, x) =
match f with
| Var y when x=y -> Int 1
| Int _ | Var _ -> Int 0
| Add(f, g) -> Add(d(f, x), d(g, x))
| Mul(f, g) -> Add(Mul(f, d(g, x)), Mul(g, d(f, x)));;
Additionally, because pattern matching is a static construct for static types, the compiler can (i) verify that you covered all cases (ii) detect redundant branches that can never match any value (iii) provide a very efficient implementation (with jumps etc.).
Excerpt from this blog article:
Pattern matching has several advantages over switch statements and method dispatch:
Pattern matches can act upon ints,
floats, strings and other types as
well as objects.
Pattern matches can act upon several
different values simultaneously:
parallel pattern matching. Method
dispatch and switch are limited to a single
value, e.g. "this".
Patterns can be nested, allowing
dispatch over trees of arbitrary
depth. Method dispatch and switch are limited
to the non-nested case.
Or-patterns allow subpatterns to be
shared. Method dispatch only allows
sharing when methods are from
classes that happen to share a base
class. Otherwise you must manually
factor out the commonality into a
separate function (giving it a
name) and then manually insert calls
from all appropriate places to your
unnecessary function.
Pattern matching provides redundancy
checking which catches errors.
Nested and/or parallel pattern
matches are optimized for you by the
F# compiler. The OO equivalent must
be written by hand and constantly
reoptimized by hand during
development, which is prohibitively
tedious and error prone so
production-quality OO code tends to
be extremely slow in comparison.
Active patterns allow you to inject
custom dispatch semantics.
Off the top of my head:
The compiler can tell if you haven't covered all possibilities in your matches
You can use a match as an assignment
If you have a discriminated union, each match can have a different 'type'
Tuples have "," and Variants have Ctor args .. these are constructors, they create things.
Patterns are destructors, they rip them apart.
They're dual concepts.
To put this more forcefully: the notion of a tuple or variant cannot be described merely by its constructor: the destructor is required or the value you made is useless. It is these dual descriptions which define a value.
Generally we think of constructors as data, and destructors as control flow. Variant destructors are alternate branches (one of many), tuple destructors are parallel threads (all of many).
The parallelism is evident in operations like
(f * g) . (h * k) = (f . h * g . k)
if you think of control flowing through a function, tuples provide a way to split up a calculation into parallel threads of control.
Looked at this way, expressions are ways to compose tuples and variants to make complicated data structures (think of an AST).
And pattern matches are ways to compose the destructors (again, think of an AST).
Switch is the two front wheels.
Pattern-matching is the entire car.
Pattern matches in OCaml, in addition to being more expressive as mentioned in several ways that have been described above, also give some very important static guarantees. The compiler will prove for you that the case-analysis embodied by your pattern-match statement is:
exhaustive (no cases are missed)
non-redundant (no cases that can never be hit because they are pre-empted by a previous case)
sound (no patterns that are impossible given the datatype in question)
This is a really big deal. It's helpful when you're writing the program for the first time, and enormously useful when your program is evolving. Used properly, match-statements make it easier to change the types in your code reliably, because the type system points you at the broken match statements, which are a decent indicator of where you have code that needs to be fixed.
If-Else (or switch) statements are about choosing different ways to process a value (input) depending on properties of the value at hand.
Pattern matching is about defining how to process a value given its structure, (also note that single case pattern matches make sense).
Thus pattern matching is more about deconstructing values than making choices, this makes them a very convenient mechanism for defining (recursive) functions on inductive structures (recursive union types), which explains why they are so abundantly used in languages like Ocaml etc.
PS: You might know the pattern-match and If-Else "patterns" from their ad-hoc use in math;
"if x has property A then y else z" (If-Else)
"some term in p1..pn where .... is the prime decomposition of x.." ((single case) pattern match)
Perhaps you could draw an analogy with strings and regular expressions? You describe what you are looking for, and let the compiler figure out how for itself. It makes your code much simpler and clearer.
As an aside: I find that the most useful thing about pattern matching is that it encourages good habits. I deal with the corner cases first, and it's easy to check that I've covered every case.

Resources