Hash Table With Chaining Search Time? - linked-list

If I implement a hash table, I understand that the insertion is done in constant time. I also understand I can find the item in constant time if there is no collision. However, if I insert an item and chain it using a linked list in some arbitrary index and it ends up being in position 2, but it's linked 3 links down the list, is this O(n) time, for searching?

This is a misunderstanding of O(n) time. Big-O analysis has to do with the general case, not a specific instance. Intuitively, think of your hash table doing thousands or millions of lookups over time and taking a step back and judging if it's doing what a hash table is supposed to do.
If you had a completely degenerate hash table that hashed everything to the same slot, you would have O(n) lookup performance.
If n >> m where n is the number of elements stored, and m is the size of the hash table, your hash table lookup performance will degrade to O(n).
In general the performance of the hash table relates to the average chain length. If this average is a (small) constant, such that it is not a function of n, you have the desired O(1) lookup performance.

Related

How does hash table work for dynamically lengthed array?

In cases where the length of an array is fixed, it makes sense to me that the time complexity of hash tables is O(1). However, I don't get how hash tables work for dynamically lengthed arrays such as a linked list. Direct indexing is clearly not possible when elements of an array are scattered all over the memory.
You are correct, a hash table where the keys are implemented as a linked list would not be O(1), because the key search is O(n). However, linked lists are not the only expandable structure.
You could, for example, use a resizable vector, such as one that doubles in size each time it needs to expand. That is directly addressable without an O(n) search so satisfies that O(1) condition.
Keep in mind that resizing the vector would almost certainly change the formula that allocates items into individual buckets of that vector, meaning there's a good chance you'd have to recalculate the buckets into which every existing item is stored.
That would still amortise to O(1), even with a single insert operation possibly having to do an O(n) reallocation, since the reallocations would be infrequent, and likely to become less frequent over time as the vector gets larger.
You can still map the elements of a linked list to a hash table. Yes, it's true we do not know the size of the list beforehand, so we cannot use a C-style or non-expandable array to represent our hash table. This is where vectors come into play (or ArrayList if you're from Java).
A crash course on vectors will be: if there is no more space in the current array, make a new array of double size, and copy the previous elements into it. More formally, if we want to insert n+1 elements into an array of size n, then it will make a new array of size 2n. For the next overflow, it will create an array of size 4n and so on.
The following code can map values of a linked list into a hash table.
void map(Node* root) {
vector<int> hash;
while(root){
hash[root->val]++;
root = root->next;
}
for(int i = 0; i<hash.size(); i++){
cout<<hash[i]<<" ";
}
}

Time Complexity for Insertion Sort on Ref. Based Linked-List?

Here is the actual question:
"what is the time complexity if the insertion sort is done on a reference-based linked list?"
I am thinking it would be O(1), right? Because you will check the nodes until you find the PREVIOUS, and what should be the node AFTER, set the pointers, and you're good. Therefore, not EVERY node would need to be checked so it can't be O(n).
Big O notation generally refers to the worst case complexity.
Inserting into an already sorted list (Which I think is how you are understanding the question based on your final paragraph) would have a complexity of O(n), since the worst case is inserting an element that goes at the end of the list, meaning there are n iterations.
Performing an insertion sort on an unsorted linked list would involve inserting n elements into a linked list, giving a complexity of O(n^2).

C linked list or hash table for matrix operations

I have matrix in C with size m x n. Size isn't known. I must to have operations on matrix such as : delete first element and find i-th element. (where size woudn't be too big , from 10 to 50 columns of matrix). What is more efficient to use, linked list or hash table? How can I map column of matrix to one element of linked list or hash table depens what I choose to use?
Thanks
Linked lists don't provide very good random access, so from that perspective, you might not want to look in to using them to represent a matrix, since your lookup time will take a hit for each element you attempt to find.
Hashtables are very good for looking up elements as they can provide near constant time lookup for any given key, assuming the hash function is decent (using well established hashtable implementations would be wise)
Provided with the constraints that you have given though, a hashtable of linked lists might be a suitable solution, though it would still present you with the problem of finding the ith element, as you'd still need to iterate through each linked list to find the element you want. This would give you O(1) lookup for the row, but O(n) for the column, where n is the column count.
Furthermore, this is difficult because you'd have to make sure EVERY list in your hashtable is updated with the appropriate number of nodes as the number of columns grows/shrinks, so you're not buying yourself much in terms of space complexity.
A 2D array is probably best suited for representing a matrix, where you provide some capability of allowing the matrix to grow by efficiently managing memory allocation and copying.
An alternate method would be to look at something like the std::vector in lieu of the linked list, which acts like an array in that it's contiguous in memory, but will allow you the flexibility of dynamically growing in size.
if its for work then use hash table, avg runtime would be O(1).
for deletion/get/set given indices at O(1) 2d arr would be optimal.

Why does a hash table take up more memory than other data-structures?

I've been doing some reading about hash tables, dictionaries, etc. All literature and videos that I have watched imply to hash-tables as having the space/time trade-off property.
I am struggling to understand why a hash table takes up more space than, say, an array or a list with the same number of total elements (values)? Does it have something to do with actually storing the hashed keys?
As far as I understand and in basic terms, a hash table takes a key identifier (say some string), passes it through some hashing function, which spits out an index to an array or some other data-structure. Apart from the obvious memory usage to store your objects (values) in the array or table, why does a hash table use up more space? I feel like I am missing something obvious...
Like you say, it's all about the trade-off between lookup time and space. The larger the number of spaces (buckets) the underlying data structure has, the greater the number of locations the hash function has where it can potentially store each item, and so the chance of a collision (and therefore worse than constant-time performance) is reduced. However, having more buckets obviously means more space is required. The ratio of number of items to number of buckets is known as the load factor, and is explained in more detail in this question: What is the significance of load factor in HashMap?
In the case of a minimal perfect hash function, you can achieve O(1) performance storing n items in n buckets (a load factor of 1).
As you mentioned, the underlying structure of hash table is an array, which is the most basic type in the data structure world.
In order to make hash table fast, which support O(1) operations. The underlying array's capacity must be more than enough. It uses the term of load factor to evaluate this point. The load factor is the ratio of numbers of element in hash table to numbers of all the cells in the hash table. It evaluates how empty the hash table is.
To make the hash table run fast, the load factor can't be greater than some threshold value. For example, in the Quadratic Probing collision resolution method, the load factor should not be greater than 0.5. When the load factor approaches 0.5 while inserting new element into hash table, we need to rehashing the table to meet the requirement.
So hash table's high performance in the run time aspect is based on more space usage. This is time and space tradeoff.

Find max/min value of column in Mnesia in constant time

How can I in constant time (or closest possible) find the maximum or minimum value on an indexed column in an Mnesia table?
I would do it outside the Mnesia database. Keep an explicit-min and explicit-max by having a process which learns about these values whenever there is an insert into the table. This gives you awfully fast constant time lookup on the values.
If you can do with O(lg n) time then you can make the table an ordered_set. From there, first/1 and last/1 should give you what you want, given that the key contains the thing you are ordering by. But this also slows down other queries in general to O(lg n).
A third trick is to go by an approximate value. Once in a while you scan the table and note the max and min values. This then materializes into what you want, but the value might not be up to do date if it was a long time since you last scanned.
Good question but I don't think it is possible. A quick look through mnesia and qlc documentation did not give me any clues on the subject.
It seems for me that secondary keys facility in mnesia is incomplete and thus is very limited in features. Not to mention horrible mnesia startup times while loading indexed tables.
I think the most reliable solution in your case would be to do explicit indexing. E.g. creating and keeping in sync alongside table with ordering on primary keys which are in fact the values you have wanted to index by.

Resources