When we write:
"exampleString".hashCode
Is there a genral mathematical way, some algorithm that calculates it or it gets it from somewhere else in Dart laungage?
And is the hashcode of a String value still the same in another languages like java, c++...?
To answer your questions, hashcodes are Dart specific, and perhaps even execution-run specific. To see how the various hashcodes work, see the source code for any given class.
The Fine Manual says:
A hash code is a single integer which represents the state of the object that affects operator == comparisons.
All objects have hash codes. The default hash code implemented by Object represents only the identity of the object, the same way as the default operator == implementation only considers objects equal if they are identical (see identityHashCode).
If operator == is overridden to use the object state instead, the hash code must also be changed to represent that state, otherwise the object cannot be used in hash based data structures like the default Set and Map implementations.
Hash codes must be the same for objects that are equal to each other according to operator ==. The hash code of an object should only change if the object changes in a way that affects equality. There are no further requirements for the hash codes. They need not be consistent between executions of the same program and there are no distribution guarantees.
Objects that are not equal are allowed to have the same hash code. It is even technically allowed that all instances have the same hash code, but if clashes happen too often, it may reduce the efficiency of hash-based data structures like HashSet or HashMap.
If a subclass overrides hashCode, it should override the operator == operator as well to maintain consistency.
That last point is important. If two objects are considered == by whatever strategy you want, they must also always have the same hashcode. The inverse is not necessarily true.
Related
Are there any use cases in which we'd call Array.to_a and Array.to_ary on an object that's already an Array?
If not, why do these methods exist within the Array class?
Are there any use cases in which we'd call these coercion methods (Array.to_a and Array.to_ary) on an object that's already Array?
Yes: in Ruby, you generally never care what class an object is an instance of. Only what it can do. So, you only care about "can this object convert itself to an array". Obviously, an array can convert itself to an array, so it should have those methods.
Slightly longer answer: if we don't care what class an object is an instance of … then why do we care whether it can convert itself to an array? Well, that's a pragmatic choice. From a purely OO perspective, it shouldn't matter. But there are certain operations that are implemented deep inside the core of the execution engine that require an object to be of a certain class, for efficiency and performance reasons. In other words, sometimes objects don't work, you need an Abstract Data Type.
For example, there are certain operations inside the Ruby execution engine that take advantage of the fact that they know about the internal memory layout of Arrays. Obviously, those operations will break in horrible ways if you hand them something that is not an Array and they go poking around in that object's memory. From a purely OO perspective, those operations shouldn't know that, and they should use Array's public interface, but alas, they don't. But, in order to give you (the programmer) an escape hatch for your own array-like objects, those operations will allow you to convert yourself to an Array first, by calling to_ary.
In other words, implementing to_ary means that your object is a kind-of array. Obviously, an array is a kind-of array, that's why it responds to to_ary.
There are other similar conversion methods in Ruby: to_str for strings, to_int for integers, to_float for floats, to_proc for "functions".
There are also their single-letter variants. The long variants mean "I really am an array, I just don't happen to be an instance of the Array class." The short variants, instead, mean "I can kinda-sorta represent myself as an array".
You can see that most obvious with nil: it responds to to_i (because it kinda-sorta makes sense to represent nil as the integer 0), but it doesn't respond to to_int (because nil is not an integer in different clothing, it is something completely different).
The fact that arrays, integers, etc. also implement to_a, to_ary, to_i, to_int, etc. means that you can treat all array-like objects the same, polymorphically. It doesn't matter if it's an array, a stack, a set, a tree, an enumerator, a range, or whatever. As long as it can kinda-sorta be represented as an array, it will respond to to_a, and as long as it actually is an array (even if its class isn't Array), it will respond to to_ary, and you don't have to check because it doesn't matter.
However, note that these situations ideally should be rare. In general, you should care about, say, whether the object can iterate itself (i.e. it responds to each). In fact, most of the things you can do with an array, you can also do with any other Enumerable, without using to_ary or to_a. Those should be the last resort.
I think it's (among other things) to avoid having a special case for nil.
Let's say foo is a method that can either return an array or nil. The snippet below would fail half of the time:
foo.each { |x| puts x }
If Array didn't implement a to_a method, you'd probably have to write something like this, which in my opinion is a bit ugly:
(foo || []).each { |x| puts x }
Instead of this:
foo.to_a.each { |x| puts x }
In a similar fashion, Integer has a to_i method and String has a to_s method, and so on.
Array implements to_a and to_ary because this allows for cleaner methods that coerce arguments into specific types.
For example, what if you had a method:
def foo(object)
#does some work on object that requires object acts like an array.
...
end
and wanted to use this method on sets generated from somewhere else in your code.
One way you can do this is to cast object.to_a before doing the operation:
def foo(object)
array = object.to_a
...
end
If Array didn't implement to_a, then you'd have to do a check:
def foo(object)
array = object.to_a if object.respond_to?(:to_a)
...
end
Say I have a NSArray, and each item is an NSDictionary with three keys keyA, keyB, and keyC - each referring to objects of unknown type (id).
If I wanted to write a method that found the given element with those three keys i.e.
-(NSDictionary *) itemThatContainsKeys:(id)objectA and:(id)objectB and:(id)objectC
would I run into trouble by simply enumerating through and testing object equality via if([i objectForKey:(keyA) isEqualTo:objectA]) etc? I would be passing in the actual objects that were set in the dictionary initialization - ie not strings with the same value but different locations.
Is this bad practise?
Is there a better way to do this without creating a database?
You can override isEqual to stipulate the notion of equality for your type. The same rules apply as in other languages:
If you provide an implementation of equals you should provide an implementation of 'hash'
Objects that are 'equal' should have the same 'hash'
Equals should be transitive -> if A equals B, and B equals C, then C must equal A.
Equals should be bi-directional -> if A equals B, then B must equal A.
This will ensure predictable behavior in classes like NSSet, that use hash for performance, falling back to equals on when there's a collision.
As Jason Whitehorn notes, Objective-C also has the convention of providing another isEqualToMyType method for convenience.
AppCode, EqualsBuilder, Boiler-plate code
It would be nice if there was something like Apache's 'EqualsBuilder' class, but in the meantime AppCode does a fine job of implementing these methods for you.
The isEqual: method compares object identity unless overwritten by a subclass. Depending on what class the target is, this May or may not be what you want. What I prefer is to use a more class specific comparison like isEqualToNumber: simply because of it's explicitness. But, isEqual should work depending on the target.
Other than that, and not knowing more specifics of what you're doing, it's hard to say if there is a better way to accomplish what you're after. But, here are my thoughts;
An array of a dictionary almost sounds like you might need a custom class to represent some construct in your app. Perhaps the dictionary could be replaced with a custom object on which you implement an isEqualToAnotherThing: method. This should simplify your logic.
My immediate project is to develop a system of CheckSums for proving that two somewhat complex objects are (functionally)EQUAL - in the sense that they have the same values for the critical properties. (Have discovered that dates/times cannot be included, so can't use JSON on the bigger object - duh :) (For my purposes) ).
To do this calling the hashCode() method on selected strings seemed to be the way to go.
Upon implementing this, I note that in practice I am getting very different values on multiple runs of highest level objects that are functionally 'identical'.
There are a number of "nums" that I have not rounded, there are integers, bools, Strings and not much more.
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?
BTW the only context that I have found material on hashCode() has been with WebSockets.
Of course I can write my own String to a unique value but I want to understand if this is a problem with Dart or something else.
I can attempt to answer the question posed in the title: "Can hashCode() method calls return different values on equal (==) Objects?"
Short answer: hash codes for two objects must be the same if those two objects are equals (==).
If you override hashCode you must also override equals. Two objects that are equal, as defined by ==, must also have the same hash code.
However, hash codes do not have to be unique. That is, a perfectly valid hash code is the value 1. A good hash code, however, should be uniformly distributed.
From the docs from Object:
Hash codes are guaranteed to be the same for objects that are equal
when compared using the equality operator ==. Other than that there
are no guarantees about the hash codes. They will not be consistent
between runs and there are no distribution guarantees.
If a subclass overrides hashCode it should override the equality
operator as well to maintain consistency.
I found the immediate problem. The object stringify() method, at one level, was not getting called, but rather some stringify property that must exist in all objects (?).
With this fixed everything is working as exactly as I would expect, and multiple runs of our Statistical Studies are returning exactly the same CheckSum at the highest levels (based on some 5 levels of hierarchy).
Meanwhile the JSON.stringify has continued to fail. Even in the most basic object. I have not been able to determine what is causing to fail. Of course, the question is not how "stringify" is accomplished.
So, empirically at least, I believe it is true that "objects with equal properties" will return equal checkSums in Dart. It was decided to round nums, I don't know if this was causing a problem - perhaps good to be aware of? And, of course, remember to be beware of things like dates, times, or anything that could legitimately vary.
_swarmii
The doc linked by Seth Ladd now include info:
They need not be consistent between executions of the same program and there are no distribution guarantees.`
so technically hashCode value can be change with same object in different executions for your question:
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?
I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.
In the process of transforming a given efficient pointer-based hash map implementation into a generic hash map implementation, I stumbled across the following problem:
I have a class representing a hash node (the hash map implementation uses a binary tree)
THashNode <KEY_TYPE, VALUE_TYPE> = class
public
Key : KEY_TYPE;
Value : VALUE_TYPE;
Left : THashNode <KEY_TYPE, VALUE_TYPE>;
Right : THashNode <KEY_TYPE, VALUE_TYPE>;
end;
In addition to that there is a function that should return a pointer to a hash node. I wanted to write
PHashNode = ^THashNode <KEY_TYPE, VALUE_TYPE>
but that doesn't compile (';' expected but '<' found).
How can I have a pointer to a generic type?
And adressed to Barry Kelly: if you read this: yes, this is based on your hash map implementation. You haven't written such a generic version of your implementation yourself, have you? That would save me some time :)
Sorry, Smasher. Pointers to open generic types are not supported because generic pointer types are not supported, although it is possible (compiler bug) to create them in certain circumstances (particularly pointers to nested types inside a generic type); this "feature" can't be removed in an update in case we break someone's code. The limitation on generic pointer types ought to be removed in the future, but I can't make promises when.
If the type in question is the one in JclStrHashMap I wrote (or the ancient HashList unit), well, the easiest way to reproduce it would be to change the node type to be a class and pass around any double-pointers as Pointer with appropriate casting. However, if I were writing that unit again today, I would not implement buckets as binary trees. I got the opportunity to write the dictionary in the Generics.Collections unit, though with all the other Delphi compiler work time was too tight before shipping for solid QA, and generic feature support itself was in flux until fairly late.
I would prefer to implement the hash map buckets as one of double-hashing, per-bucket dynamic arrays or linked lists of cells from a contiguous array, whichever came out best from tests using representative data. The logic is that cache miss cost of following links in tree/list ought to dominate any difference in bucket search between tree and list with a good hash function. The current dictionary is implemented as straight linear probing primarily because it was relatively easy to implement and worked with the available set of primitive generic operations.
That said, the binary tree buckets should have been an effective hedge against poor hash functions; if they were balanced binary trees (=> even more modification cost), they would be O(1) on average and O(log n) worst case performance.
To actually answer your question, you can't make a pointer to a generic type, because "generic types" don't exist. You have to make a pointer to a specific type, with the type parameters filled in.
Unfortunately, the compiler doesn't like finding angle brackets after a ^. But it will accept the following:
TGeneric<T> = record
value: T;
end;
TSpecific = TGeneric<string>;
PGeneric = ^TSpecific;
But "PGeneric = ^TGeneric<string>;" gives a compiler error. Sounds like a glitch to me. I'd report that over at QC if I was you.
Why are you trying to make a pointer to an object, anyway? Delphi objects are a reference type, so they're pointers already. You can just cast your object reference to Pointer and you're good.
If Delphi supported generic pointer types at all, it would have to look like this:
type
PHashNode<K, V> = ^THashNode<K, V>;
That is, mention the generic parameters on the left side where you declare the name of the type, and then use those parameters in constructing the type on the right.
However, Delphi does not support that. See QC 66584.
On the other hand, I'd also question the necessity of having a pointer to a class type at all. Generic or not. they are needed only very rarely.
There's a generic hash map called TDictionary in the Generics.Collections unit. Unfortunately, it's badly broken at the moment, but it's apparently going to be fixed in update #3, which is due out within a matter of days, according to Nick Hodges.