Why are arrays not implemented as hashes with numerical keys? - ruby-on-rails

Having returned to development after an absence of over a decade I am getting myself up to speed with the latest technologies for web development. Reading this post I see that I already understood the difference between hashes and arrays.
However, doesn't this mean that arrays are just a type of hash that uses a numerical key? As there is no reason to believe that an implementation of an array will automatically maintain the sequential nature of the array indices (when you delete or insert items for example), is there any greater difference than the inherent ordering of an array?
I mean, to step through an array, you need to set up a loop through the indices, the same as looping through the keys of a hash, and then you could order the numerical hash key set to behave the same (i.e. access the items from 1 to the last number that is a key in the hash in numerical sequence). To access an array element, you use the indices of the value you want, the same as giving the numerical key from the hash.
I came to this question while learning about arrays and hashes in Ruby on Rails, but it is a general question.

A Hash is essentially an array. Hash keys have some type of conversion function to translate an object of another type (or an integer set of other values) into the integer index of an array. Conversely, an array is a Hash that does not translate the keys into a separate type or value of index. However, calling an array a Hash implies an extra layer of functionality that does not exist, since there is no key conversion.
By definition, the objects of an array are stored in consecutive locations in memory, accessible by index.
Even when either type of data structure can be used, one benefit of using hashes with Integer keys is that a larger spread of integers can be stored in a smaller number of buckets. E.g., if your number keys are 5 integers near 1, 10, 100, 1000 and 10000 you don't need 10K buckets to have a hash of these 5 elements, but you do need that many if you use a straight-up array. Hash functions tend to be recalculated and more memory reallocated as the hash grows and the benefit of using an array is that its size can be more easily controlled and can remain fixed.

Here is how declarative programming defines it.
Difference between Declarative and Procedural Programming?
http://en.wikipedia.org/wiki/Procedural_programming
http://en.wikipedia.org/wiki/Declarative_programming
There are primitive, composite, and abstract data structures.
- An Array is composite.
- A Hash is abstract.
We have both because they are fundamentally different.
For example you can't pop/push primitives into a Hash like you can with an Array, because a Hash uses unordered key-values while an Array has an index.
http://en.m.wikipedia.org/wiki/List_of_data_structures

Related

Array vs. Dictionary - which is more expensive to initialize?

Regarding iOS Swift -
Which is heavier / more expensive to initialize - an Array or Dictionary?
This small detail matters when you're dealing w/ large data sets & in very large corporations that may have millions of items in an array, dictionary or some other data structure that may be based off these.
-
TEST RESULTS:
- The test results above show how much time it took to initialize 1,000,000 empty Arrays, Dictionaries, & I decided to throw Set in there too.
-
ANSWER: ARRAY'S LIGHTER THAN SET & DICTIONARY.
-
BELOW IS THE REMAINDER OF THE DESCRIPTION THAT WAS WRITTEN WHEN I ORIGINALLY WROTE THIS QUESTION:
In Java, hash map is built on top of their array.
Apple says the Dictionary is "a type" of hash table and
"similar data types are known as hashes or associated arrays."
Apple clearly says a dictionary is not a hash map / hash table, nor an associated array. It is a type of those & similar.
A "a type of" doesn't mean it's some revolutionary new standard that is completely different from the other similar types, but Apple is clear that they are not the same. It may differ in how they choose to calculate the hash, how they store elements that collide at the same array index, etc.
https://developer.apple.com/documentation/swift/dictionary?fbclid=IwAR30CezlfvqpRdqjn5cnJlQmUc5Ys70GwJX7mYOKgHyDcd_kKuURgdoYnCY
I put everything in a measure block.

Initialisation order in Lua Table Constructor

So, a table constructor has two components, list-like and record-like. Do list-like entries always take precedence over record-like ones? I mean, consider the following scenario:
a = {[1]=1, [2]=2, 3}
print(a[1]) -- 3
a = {1, [2]=2, [1]=3}
print(a[1]) -- 1
Is the index 1 always associated with the first list-like entry, 2 with the second, and so on? Or is there something else?
There are two types of tables in Lua, arrays and dictionaries, these are what you call "lists" and "records". An array, contains values in an order, this gives you a few advantages, like faster iteration or inserting/removing values. Dictionaries work like a giant lookup table, it has no order, it's advantages are how you can use any value as a key, and you are not as restricted.
When you construct a table, you have 2 syntaxes, you can seperate the values with commas, e.g. {2,4,6,8} thereby creating an array with keys 1 through n, or you can define key-value pairs, e.g. {[1]=2,[58]=4,[368]=6,[48983]=8} creating a dictionary, you can often mix these syntaxes and you won't run into any problems, but this is not the case in your scenario.
What you are doing is defining the same key twice during the table's initial construction. This is most generally impractical and as such hasn't really had any serious thought put into it during the language's development. This means that what happens is essentially unspecified behaviour. It is not completely understood what effect this will have, and may be inconsistent across different platforms or implementations.
As such, you should not use this in any commercial projects, or any code you'll be sharing with other people. When in doubt, construct an empty table and define the key-value pairs afterward.

Is objectForKey slow for big NSDictionary?

Assume we have very big NSDictionary, when we want to call the objectForKey method, will it make lots of operations in core to get value? Or will it point to value in the memory directly?
How does it works in core?
The CFDictionary section of the Collections Programming Topics for Core Foundation (which you should look into if you want to know more) states:
A dictionary—an object of the CFDictionary type—is a hashing-based
collection whose keys for accessing its values are arbitrary,
program-defined pieces of data (or pointers to data). Although the key
is usually a string (or, in Core Foundation, a CFString object), it
can be anything that can fit into the size of a pointer—an integer, a
reference to a Core Foundation object, even a pointer to a data
structure (unlikely as that might be).
This is what wikipedia has to say about hash tables:
Ideally, the hash function should map each possible key to a unique
slot index, but this ideal is rarely achievable in practice (unless
the hash keys are fixed; i.e. new entries are never added to the table
after it is created). Instead, most hash table designs assume that
hash collisions—different keys that map to the same hash value—will
occur and must be accommodated in some way. In a well-dimensioned hash
table, the average cost (number of instructions) for each lookup is
independent of the number of elements stored in the table. Many hash
table designs also allow arbitrary insertions and deletions of
key-value pairs, at constant average (indeed, amortized) cost per
operation.
The performance therefore depends on the quality of the hash. If it is good then accessing elements should be an O(1) operation (i.e. not dependent on the number of elements).
EDIT:
In fact after reading further the Collections Programming Topics for Core Foundation, apple gives an answer to your question:
The access time for a value in a CFDictionary object is guaranteed to
be at worst O(log N) for any implementation, but is often O(1)
(constant time). Insertion or deletion operations are typically in
constant time as well, but are O(N*log N) in the worst cases. It is
faster to access values through a key than accessing them directly.
Dictionaries tend to use significantly more memory than an array with
the same number of values.
NSDictionary is essentially an Hash Table structure, thus Big-O for lookup is O(1). However, to avoid reallocations (and to achieve the O(1)) complexity you should use dictionaryWithCapacity: to create a new Dictionary with appropriate size with respect to the size of your dataset.

Sorted TStringList, how does the sorting work?

I'm simply curious as lately I have been seeing the use of Hashmaps in Java and wonder if Delphi's Sorted String list is similar at all.
Does the TStringList object generate a Hash to use as an index for each item? And how does the search string get checked against the list of strings via the Find function?
I make use of Sorted TStringLists very often and I would just like to understand what is going on a little bit more.
Please assume I don't know how a hash map works, because I don't :)
Thanks
I'm interpreting this question, quite generally, as a request for an overview of lists and dictionaries.
A list, as almost everyone knows, is a container that is indexed by contiguous integers.
A hash map, dictionary or associative array is a container whose index can be of any type. Very commonly, a dictionary is indexed with strings.
For sake of argument let us call our lists L and our dictionaries D.
Lists have true random access. An item can be looked-up in constant time if you know its index. This is not the case for dictionaries and they usually resort to hash-based algorithms to achieve efficient random access.
A sorted list can perform binary search when you attempt to find a value. Finding a value, V, is the act of obtaining the index, I, such that L[I]=V. Binary search is very efficient. If the list is not sorted then it must perform linear search which is much less efficient. A sorted list can use insertion sort to maintain the order of the list – when a new item is added, it is inserted at the correct location.
You can think of a dictionary as a list of <Key,Value> pairs. You can iterate over all pairs, but more commonly you use index notation to look-up a value for a given key: D[Key]. Note that this is not the same operation as finding a value in a list – it is the analogue of reading L[I] when you know the index I.
In older versions of Delphi it was common to coax dictionary behaviour out of string lists. The performance was terrible. There was little flexibility in the contents.
With modern Delphi, there is TDictionary, a generic class that can hold anything. The implementation uses a hash and although I have not personally tested its performance I understand it to be respectable.
There are commonly used algorithms that optimally use all of these containers: unsorted lists, sorted lists, dictionaries. You just need to use the right one for the problem at hand.
TStringList holds the strings in an array.
If you call Sort on an otherwise unsorted (Sorted property = false) string list then a QuickSort is performed to sort the items.
The same happens if you set Sorted to true.
If you call Find (or IndexOf which calls find) on an unsorted string list (Sorted property = false, even if you explicitly called Sort the list is considered unsorted if the Sorted property isn't true) then a linear search is performed comparing all strings from the start till a match is found.
If you call Find on a sorted string list (Sorted property = true) then a binary search is performed (see http://en.wikipedia.org/wiki/Binary_search for details).
If you add a string to a sorted string list, a binary search is performed to determine the correct insertion position, all following elements in the array are shifted by one and the new string is inserted.
Because of this insertion performance gets a lot worse the larger the string list is. If you want to insert a large number of entries into a sorted string list, it's usually better to turn sorting off, insert the strings, then set Sorted back to true which performs a quick sort.
The disadvantage of that approach is that you will not be able to prevent the insertion of duplicates.
EDIT: If you want a hash map, use TDictionary from unit Generics.Collections
You could look at the source code, since that comes with Delphi. Ctrl-Click on the "sort" call in your code.
It's a simple alphabetical sort in non-Unicode Delphi, and a slightly more complex Unicode one in later versions. You can supply your own comparison for custom sort orders. Unfortunately I don't have a recent version of Delphi so can't confirm, but I expect that under the hood there's a proper Unicode-aware and locale-aware string comparison routine. Unicode sorting/string comparison is not trivial and a little web searching will point out some of the pitfalls.
Supplying your own comparison routine is often done when you have delimited text in the strings or objects attached to them (the Objects property). In those cases you often wat to sort by a property of the object or something other than the first field in the string. Or it might be as simple as wanting a numerical sort on the strings (so "2" comes after "1" rather than after "19")
There is also a THashedStringList, which could be an option (especially in older Delphi versions).
BTW, the Unicode sort routines for TStringList are quite slow. If you override the TStringList.CompareStrings method then if the strings only contain Ansi characters (which if you use English exclusively they will), you can use customised Ansi string comparisons. I use my own customised TStringList class that does this and it is 4 times faster than the TStringList class for a sorted list for both reading and writing strings from/to the list.
Delphi's dictionary type (in generics-enabled versions of Delphi) is the closest thing to a hashmap, that ships with Delphi. THashedStringList makes lookups faster than they would be in a sorted string list. you can do lookups using a binary search in a sorted stringlist, so it's faster than brute force searches, but not as fast as a hash.
The general theory of a hash is that it is unordered, but very fast on lookup and insertion. A sorted list is reasonably fast on insertion until the size of the list gets large, although it's not as efficient as a dictionary for insertion.
The big benefit of a list is that it is ordered but a hash-lookup dictionary is not.

Delphi array elements alphanumeric sort order?

Is the best way to sort an array in Delphi is "alphanumeric".
I found this comment in an old code of my application
" The elements of this array must be in ascending, alphanumeric
sort order."
If so ,what copuld be the reason?
-Vas
There's no "best" way as to how to sort the elements of an array (or any collection for that fact). Sort is a humanized characteristic (things are not usually sorted) so I'm guessing the comment has more to do with what your program is expecting.
More concretely, there's probably other section of code elsewhere that expect the array elements to be sorted alphanumerically. It can be something so simple as displaying it into a TreeView already ordered so that the calling code doesn't have to sort the array first.
Arrays are represented as a contiguous memory assignment so that access is fast. Internally the compiler just does a call to GetMem asking for SizeOf(Type) * array size. There's nothing in the way the elements are sorted that affects the performance or memory size of the arrays in general. It MUST be in the program logic.
Most often an array is sorted to provide faster search times. Given a list of length L, I can compare with the midpoint (L DIV 2) and quickly determine if I need to look at the greater half, or the lesser half, and recursively continue using this pattern until I either have nothing to divide by or have found my match. This is what is called a Binary search. If the list is NOT sorted, then this type of operation is not available and instead I must inspect every item in the list until I reach the end.
No, there is no "best way" of sorting. And that's one of the reasons why you have multiple sorting techniques out there.
With QuickSort, you even provide the comparison function where you determine what order you ultimately want.
Sorting an array in some way is useful when you're trying to do a binary search on the array. A binary search can be extremely fast, compared to other methods. But if the sort error is wrong, the search will be unable to find the record.
Other reasons to keep arrays sorted are almost always for cosmetic reasons, to decide how the array is sent to some output.
The best way to re-order an array depends of the length of the array and the type of data it contains. A QuickSort algorithm would give a fast result in most cases. Delphi uses it internally when you're working with string-lists and some other lists. Question is, do you really need to sort it? Does it really need to stay an array even?
But the best way to keep an array sorted is by keeping it sorted from the first element that you add to it! In general, I write a wrapper around my array types, which will take care of keeping the array ordered. The 'Add' method will search for the biggest value in the array that's less or equal to the value that I want to add. I then insert the new item right after that position. To me, that would be the best solution. (With big arrays you could use the binary search method again to find the location where you need to insert the new record. It's slower than appending records to the end but you never have to wonder if it's sorted or not, since it is...

Resources