How does hash table work for dynamically lengthed array? - linked-list

In cases where the length of an array is fixed, it makes sense to me that the time complexity of hash tables is O(1). However, I don't get how hash tables work for dynamically lengthed arrays such as a linked list. Direct indexing is clearly not possible when elements of an array are scattered all over the memory.

You are correct, a hash table where the keys are implemented as a linked list would not be O(1), because the key search is O(n). However, linked lists are not the only expandable structure.
You could, for example, use a resizable vector, such as one that doubles in size each time it needs to expand. That is directly addressable without an O(n) search so satisfies that O(1) condition.
Keep in mind that resizing the vector would almost certainly change the formula that allocates items into individual buckets of that vector, meaning there's a good chance you'd have to recalculate the buckets into which every existing item is stored.
That would still amortise to O(1), even with a single insert operation possibly having to do an O(n) reallocation, since the reallocations would be infrequent, and likely to become less frequent over time as the vector gets larger.

You can still map the elements of a linked list to a hash table. Yes, it's true we do not know the size of the list beforehand, so we cannot use a C-style or non-expandable array to represent our hash table. This is where vectors come into play (or ArrayList if you're from Java).
A crash course on vectors will be: if there is no more space in the current array, make a new array of double size, and copy the previous elements into it. More formally, if we want to insert n+1 elements into an array of size n, then it will make a new array of size 2n. For the next overflow, it will create an array of size 4n and so on.
The following code can map values of a linked list into a hash table.
void map(Node* root) {
vector<int> hash;
while(root){
hash[root->val]++;
root = root->next;
}
for(int i = 0; i<hash.size(); i++){
cout<<hash[i]<<" ";
}
}

Related

When would I use a linked List vs a Stack in C++

I know that in a linked list you dont need preallocated memory and the insertion and deletion method is really easy to do and the only thing I really know about stack is the push and pop method.
Linked lists are good for inserting and removing elements at random positions. In a stack, we only append to or remove from the end.
Linked List vs Array(Stack)
Both Arrays and Linked List can be used to store linear data of similar types, but they both have some advantages and disadvantages over each other.
Following are the points in favour of Linked Lists.
(1) The size of the arrays is fixed: So we must know the upper limit on the number of elements in advance. Also, generally, the allocated memory is equal to the upper limit irrespective of the usage, and in practical uses, upper limit is rarely reached.
(2) Inserting a new element in an array of elements is expensive, because room has to be created for the new elements and to create room existing elements have to shifted.
For example, suppose we maintain a sorted list of IDs in an array id[].
id[] = [1000, 1010, 1050, 2000, 2040, …..].
And if we want to insert a new ID 1005, then to maintain the sorted order, we have to move all the elements after 1000 (excluding 1000).
Deletion is also expensive with arrays until unless some special techniques are used. For example, to delete 1010 in id[], everything after 1010 has to be moved.
So Linked list provides following two advantages over arrays
1) Dynamic size
2) Ease of insertion/deletion
Linked lists have following drawbacks:
1) Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do binary search with linked lists.
2) Extra memory space for a pointer is required with each element of the list.
3) Arrays have better cache locality that can make a pretty big difference in performance.

Hash Table With Chaining Search Time?

If I implement a hash table, I understand that the insertion is done in constant time. I also understand I can find the item in constant time if there is no collision. However, if I insert an item and chain it using a linked list in some arbitrary index and it ends up being in position 2, but it's linked 3 links down the list, is this O(n) time, for searching?
This is a misunderstanding of O(n) time. Big-O analysis has to do with the general case, not a specific instance. Intuitively, think of your hash table doing thousands or millions of lookups over time and taking a step back and judging if it's doing what a hash table is supposed to do.
If you had a completely degenerate hash table that hashed everything to the same slot, you would have O(n) lookup performance.
If n >> m where n is the number of elements stored, and m is the size of the hash table, your hash table lookup performance will degrade to O(n).
In general the performance of the hash table relates to the average chain length. If this average is a (small) constant, such that it is not a function of n, you have the desired O(1) lookup performance.

C linked list or hash table for matrix operations

I have matrix in C with size m x n. Size isn't known. I must to have operations on matrix such as : delete first element and find i-th element. (where size woudn't be too big , from 10 to 50 columns of matrix). What is more efficient to use, linked list or hash table? How can I map column of matrix to one element of linked list or hash table depens what I choose to use?
Thanks
Linked lists don't provide very good random access, so from that perspective, you might not want to look in to using them to represent a matrix, since your lookup time will take a hit for each element you attempt to find.
Hashtables are very good for looking up elements as they can provide near constant time lookup for any given key, assuming the hash function is decent (using well established hashtable implementations would be wise)
Provided with the constraints that you have given though, a hashtable of linked lists might be a suitable solution, though it would still present you with the problem of finding the ith element, as you'd still need to iterate through each linked list to find the element you want. This would give you O(1) lookup for the row, but O(n) for the column, where n is the column count.
Furthermore, this is difficult because you'd have to make sure EVERY list in your hashtable is updated with the appropriate number of nodes as the number of columns grows/shrinks, so you're not buying yourself much in terms of space complexity.
A 2D array is probably best suited for representing a matrix, where you provide some capability of allowing the matrix to grow by efficiently managing memory allocation and copying.
An alternate method would be to look at something like the std::vector in lieu of the linked list, which acts like an array in that it's contiguous in memory, but will allow you the flexibility of dynamically growing in size.
if its for work then use hash table, avg runtime would be O(1).
for deletion/get/set given indices at O(1) 2d arr would be optimal.

Why does a hash table take up more memory than other data-structures?

I've been doing some reading about hash tables, dictionaries, etc. All literature and videos that I have watched imply to hash-tables as having the space/time trade-off property.
I am struggling to understand why a hash table takes up more space than, say, an array or a list with the same number of total elements (values)? Does it have something to do with actually storing the hashed keys?
As far as I understand and in basic terms, a hash table takes a key identifier (say some string), passes it through some hashing function, which spits out an index to an array or some other data-structure. Apart from the obvious memory usage to store your objects (values) in the array or table, why does a hash table use up more space? I feel like I am missing something obvious...
Like you say, it's all about the trade-off between lookup time and space. The larger the number of spaces (buckets) the underlying data structure has, the greater the number of locations the hash function has where it can potentially store each item, and so the chance of a collision (and therefore worse than constant-time performance) is reduced. However, having more buckets obviously means more space is required. The ratio of number of items to number of buckets is known as the load factor, and is explained in more detail in this question: What is the significance of load factor in HashMap?
In the case of a minimal perfect hash function, you can achieve O(1) performance storing n items in n buckets (a load factor of 1).
As you mentioned, the underlying structure of hash table is an array, which is the most basic type in the data structure world.
In order to make hash table fast, which support O(1) operations. The underlying array's capacity must be more than enough. It uses the term of load factor to evaluate this point. The load factor is the ratio of numbers of element in hash table to numbers of all the cells in the hash table. It evaluates how empty the hash table is.
To make the hash table run fast, the load factor can't be greater than some threshold value. For example, in the Quadratic Probing collision resolution method, the load factor should not be greater than 0.5. When the load factor approaches 0.5 while inserting new element into hash table, we need to rehashing the table to meet the requirement.
So hash table's high performance in the run time aspect is based on more space usage. This is time and space tradeoff.

Delphi array elements alphanumeric sort order?

Is the best way to sort an array in Delphi is "alphanumeric".
I found this comment in an old code of my application
" The elements of this array must be in ascending, alphanumeric
sort order."
If so ,what copuld be the reason?
-Vas
There's no "best" way as to how to sort the elements of an array (or any collection for that fact). Sort is a humanized characteristic (things are not usually sorted) so I'm guessing the comment has more to do with what your program is expecting.
More concretely, there's probably other section of code elsewhere that expect the array elements to be sorted alphanumerically. It can be something so simple as displaying it into a TreeView already ordered so that the calling code doesn't have to sort the array first.
Arrays are represented as a contiguous memory assignment so that access is fast. Internally the compiler just does a call to GetMem asking for SizeOf(Type) * array size. There's nothing in the way the elements are sorted that affects the performance or memory size of the arrays in general. It MUST be in the program logic.
Most often an array is sorted to provide faster search times. Given a list of length L, I can compare with the midpoint (L DIV 2) and quickly determine if I need to look at the greater half, or the lesser half, and recursively continue using this pattern until I either have nothing to divide by or have found my match. This is what is called a Binary search. If the list is NOT sorted, then this type of operation is not available and instead I must inspect every item in the list until I reach the end.
No, there is no "best way" of sorting. And that's one of the reasons why you have multiple sorting techniques out there.
With QuickSort, you even provide the comparison function where you determine what order you ultimately want.
Sorting an array in some way is useful when you're trying to do a binary search on the array. A binary search can be extremely fast, compared to other methods. But if the sort error is wrong, the search will be unable to find the record.
Other reasons to keep arrays sorted are almost always for cosmetic reasons, to decide how the array is sent to some output.
The best way to re-order an array depends of the length of the array and the type of data it contains. A QuickSort algorithm would give a fast result in most cases. Delphi uses it internally when you're working with string-lists and some other lists. Question is, do you really need to sort it? Does it really need to stay an array even?
But the best way to keep an array sorted is by keeping it sorted from the first element that you add to it! In general, I write a wrapper around my array types, which will take care of keeping the array ordered. The 'Add' method will search for the biggest value in the array that's less or equal to the value that I want to add. I then insert the new item right after that position. To me, that would be the best solution. (With big arrays you could use the binary search method again to find the location where you need to insert the new record. It's slower than appending records to the end but you never have to wonder if it's sorted or not, since it is...

Resources