Time complexity of looping through a linked list of size n with nested for loop - linked-list

I'm not sure if this has been asked already, but I ran into a time complexity question and couldn't find an answer to it.
I understand the time to loop through a linked list of size n is O(n), but if that linked list was divided into groups of k, and the heads of each group was stored in a list, what would the time complexity be to use a nested for loop to go through each groups of k in the list. Would it still be O(n)?

Yes.
You would still have a O(𝑛) complexity for visiting each node, and have an overhead of O(𝑛/π‘˜) to go from one group to the next. So if π‘˜ is a variable, then the complexity is O(𝑛+𝑛/π‘˜), and as 𝑛/π‘˜β‰€π‘› that is O(𝑛).
Search on sorted list(s)
If we alter the question to be about finding a given value in the list(s), then the grouped data structure can benefit from a jump search kind of algorithm, where first the outer list is traversed to find the list that has the right range to have the value. Then only that sublist needs to be traversed. If the groups are evenly spread, and the size of the outer list is approximately that of an inner list, (i.e. βˆšπ‘›), then the time complexity for searching a value is O(βˆšπ‘›)

Related

Pivot Table type of query in Cypher (in one pass)

I am trying to perform the following query in one pass but I conclude that it is impossible and would furthermore lead to some form of "nested" structure which is never good news in terms of performance.
I may however be missing something here, so I thought I might ask.
The underlying data structure is a many-to-many relationship between two entities A<---0:*--->B
The end goal is to obtain how many times are objects of entity B assigned to objects of entity A within a specific time interval as a percentage of total assignments.
It is exactly this latter part of the question that causes the headache.
Entity A contains an item_date field
Entity B contains an item_category field.
The presentation of the results can be expanded to a table whose columns are the distinct item_date and rows are the different item_category normalised counts. I am just mentioning this for clarity, the query does not have to return the results in that exact form.
My Attempt:
with 12*30*24*3600 as window_length, "1980-1-1" as start_date,
"1985-12-31" as end_date
unwind range(apoc.date.parse(start_date,"s","yyyy-MM-dd"),apoc.date.parse(end_date,"s","yyyy-MM-dd"),window_length) as date_step
match (a:A)<-[r:RELATOB]-(b:B)
where apoc.date.parse(a.item_date,"s","yyyy-MM-dd")>=date_step and apoc.date.parse(a.item_date,"s","yyyy-MM-dd")<(date_step+window_length)
with window_length, date_step, count(r) as total_count unwind ["code_A", "code_B", "code_C"] as the_code [MATCH THE PATTERN AGAIN TO COUNT SPECIFIC `item_code` this time.
I am finding it difficult to express this in one pass because it requires the equivalent of two independent GROUP BY-like clauses right after the definition of the graph pattern. You can't express these two in parallel, so you have to unwind them. My worry is that this leads to two evaluations: One for the total count and one for the partial count. The bit I am trying to optimise is some way of re-writing the query so that it does not have to count nodes it has "captured" before but this is very difficult with the implied way the aggregate functions are being applied to a set.
Basically, any attribute that is not an aggregate function becomes the stratification variable. I have to say here that a plain simple double stratification ("Grab everything, produce one level of count by item_date produce another level of count by item_code) does not work for me because there is NO WAY to control the width of the window_length. This means that I cannot compare between two time periods with different rates of assignments of item_codes because the time periods are not equal :(
Please note that retrieving the counts of item_code and then normalising for the sum of those particular codes within a period of time (externally to cypher) would not lead to accurate percentages because the normalisation there would be with respect to that particular subset of item_code rather than the total.
Is there a way to perform a simultaneous count of r within a time period but then (somehow) re-use the already matched a,b subsets of nodes to now evaluate a partial count of those specific b's that (b:{item_code:the_code})-[r2:RELATOB]-(a) where a.item_date...?
If not, then I am going to move to the next fastest thing which is to perform two independent queries (one for the total count, one for the partials) and then do the division externally :/ .
The solution proposed by Tomaz Bratanic in the comment is (I think) along these lines:
with 1*30*24*3600 as window_length,
"1980-01-01" as start_date,
"1985-12-31" as end_date
unwind range(apoc.date.parse(start_date,"s","yyyy-MM-dd"),apoc.date.parse(end_date,"s","yyyy-MM-dd"),window_length) as date_step
unwind ["code_A","code_B","code_c"] as the_code
match (a:A)<-[r:RELATOB]-(b:B)
where apoc.date.parse(a.item_date,"s","yyyy-MM-dd")>=date_step and apoc.date.parse(a.item_category,"s","yyyy-MM-dd")<(date_step+window_length)
return the_code, date_step, tofloat(sum(case when b.item_category=code then 1 else 0 end)/count(r)) as perc_count order by date_step asc
This:
Is working
It does exactly what I was after (after some minor modifications)
It even adds filling in the missing values with zero because of that ELSE 0 which is effectively forcing a zero even when no count data exists.
But in realistic conditions it is at least 30 seconds slower (no it is not, please see edit) than what I am currently using which re-matches. (And no, it is not because of the extra data that is now returned as the missing data are filled in, this is raw query time).
I thought that it might be worth attaching the query plans here:
This is the plan of the applying the same pattern twice but fast way of doing it:
This is the plan of the performing the count in one pass but slow way of doing it:
I might see how does time scales with data in the input later on, maybe the two are scaling at different rates but at this point, the "one-pass" seems to be already slower than the "two-pass" and frankly, I cannot see how it could get any faster with more data. This is already a simple count of 12 months over 3 categories distributed amongst 18k items (approximately).
Hope this might help others too.
EDIT:
While I had done this originally, there was another modification that I did not include where the second unwind goes AFTER the match. This slashes the time by 20 seconds below the "double match" as the unwind affects the return rather than multiple executions of the same query which now becomes:
with 1*30*24*3600 as window_length,
"1980-01-01" as start_date,
"1985-12-31" as end_date
unwind range(apoc.date.parse(start_date,"s","yyyy-MM-dd"),apoc.date.parse(end_date,"s","yyyy-MM-dd"),window_length) as date_step
match (a:A)<-[r:RELATOB]-(b:B)
where apoc.date.parse(a.item_date,"s","yyyy-MM-dd")>=date_step and apoc.date.parse(a.item_category,"s","yyyy-MM-dd")<(date_step+window_length)
unwind ["code_A","code_B","code_c"] as the_code
return the_code, date_step, tofloat(sum(case when b.item_category=code then 1 else 0 end)/count(r)) as perc_count order by date_step asc
And here is the execution plan for it too:
Original double match approximately 55790ms, Doing it in one pass (both unwinds BEFORE the match) 82306ms, Doing it in one pass (second unwind after the match) 23461ms.

Time Complexity for Insertion Sort on Ref. Based Linked-List?

Here is the actual question:
"what is the time complexity if the insertion sort is done on a reference-based linked list?"
I am thinking it would be O(1), right? Because you will check the nodes until you find the PREVIOUS, and what should be the node AFTER, set the pointers, and you're good. Therefore, not EVERY node would need to be checked so it can't be O(n).
Big O notation generally refers to the worst case complexity.
Inserting into an already sorted list (Which I think is how you are understanding the question based on your final paragraph) would have a complexity of O(n), since the worst case is inserting an element that goes at the end of the list, meaning there are n iterations.
Performing an insertion sort on an unsorted linked list would involve inserting n elements into a linked list, giving a complexity of O(n^2).

C linked list or hash table for matrix operations

I have matrix in C with size m x n. Size isn't known. I must to have operations on matrix such as : delete first element and find i-th element. (where size woudn't be too big , from 10 to 50 columns of matrix). What is more efficient to use, linked list or hash table? How can I map column of matrix to one element of linked list or hash table depens what I choose to use?
Thanks
Linked lists don't provide very good random access, so from that perspective, you might not want to look in to using them to represent a matrix, since your lookup time will take a hit for each element you attempt to find.
Hashtables are very good for looking up elements as they can provide near constant time lookup for any given key, assuming the hash function is decent (using well established hashtable implementations would be wise)
Provided with the constraints that you have given though, a hashtable of linked lists might be a suitable solution, though it would still present you with the problem of finding the ith element, as you'd still need to iterate through each linked list to find the element you want. This would give you O(1) lookup for the row, but O(n) for the column, where n is the column count.
Furthermore, this is difficult because you'd have to make sure EVERY list in your hashtable is updated with the appropriate number of nodes as the number of columns grows/shrinks, so you're not buying yourself much in terms of space complexity.
A 2D array is probably best suited for representing a matrix, where you provide some capability of allowing the matrix to grow by efficiently managing memory allocation and copying.
An alternate method would be to look at something like the std::vector in lieu of the linked list, which acts like an array in that it's contiguous in memory, but will allow you the flexibility of dynamically growing in size.
if its for work then use hash table, avg runtime would be O(1).
for deletion/get/set given indices at O(1) 2d arr would be optimal.

Best Possible algorithm to check if two linked lists are merging at any point? If so, where? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Linked list interview question
This is an interview question for which I don't have an answer.
Given Two lists, You cannot change list and you dont know the length.
Give best possible algorithm to:
Check if two lists are merging at any point?
If merging, at what point they are merging?
If I allow you to change the list how would you modify your algorithm?
I'm assuming that we are talking about simple linked lists and we can safely create a hash table of the list element pointers.
Q1: Iterate to end of both lists, If the respective last elements are the same, the lists merge at some point.
Complexity - O(N), space complexity - O(1)
Q2:
Put all elements of one list into a hash table
Iterate over 2nd list, probing the hash table for each element of the list. The first hit (if any) is the merge point, and we have the position in the 2nd list.
To get the position in the 1st list, iterate over the first list again looking for the element found in the previous step.
Time complexity - O(N). Space complexity - O(N)
Q3:
As Q1, but also reverse the direction of the list pointers.
Then iterate the reversed lists looking for the last common element - that is the merge point - and restoring the list to the original order.
Time complexity - O(N). Space complexity - O(1)
Number 1: Just iterate both and then check if they end with the same element. Thats O(n) and it cant be beaten (as it might possibly be the last element that is common, and getting there always takes O(n)).
Walk those two lists parallel by one element, add each element to Set of visited nodes (can be hash map, or simple set, you only need to check if you visited that node before). At each step check if you visited that node (if yes, then it's merging point), and add it to set of nodes if you visit it first time. Another version (as pointed by #reinier) is to walk only first list, store its nodes in Set and then only check second list against that Set. First approach is faster when your lists merge early, as you don't need to store all nodes from first list. Second is better at worst case, where both list don't merge at all, since it didn't store nodes from second list in Set
see 1.
Instead of Set, you can try to mark each node, but if you cannot modify structure, then it's not so helpful. You could also try unlink each visited node and link it to some guard node (which you check at each step if you encountered it while traversing). It saves memory for Set if list is long enough.
Traverse both the list and have a global variable for finding the number of NULL encountered . If they merge at some point there will be only 1 NULL else there will be two NULL.

Why is inserting in the middle of a linked list O(1)?

According to the Wikipedia article on linked lists, inserting in the middle of a linked list is considered O(1). I would think it would be O(n). Wouldn't you need to locate the node which could be near the end of the list?
Does this analysis not account for the finding of the node operation (though it is required) and just the insertion itself?
EDIT:
Linked lists have several advantages over arrays. Insertion of an element at a specific point of a list is a constant-time operation, whereas insertion in an array may require moving half of the elements, or more.
The above statement is a little misleading to me. Correct me if I'm wrong, but I think the conclusion should be:
Arrays:
Finding the point of insertion/deletion O(1)
Performing the insertion/deletion O(n)
Linked Lists:
Finding the point of insertion/deletion O(n)
Performing the insertion/deletion O(1)
I think the only time you wouldn't have to find the position is if you kept some sort of pointer to it (as with the head and the tail in some cases). So we can't flatly say that linked lists always beat arrays for insert/delete options.
You are correct, the article considers "Indexing" as a separate operation. So insertion is itself O(1), but getting to that middle node is O(n).
The insertion itself is O(1). Node finding is O(n).
No, when you decide that you want to insert, it's assumed you are already in the middle of iterating through the list.
Operations on Linked Lists are often done in such a way that they aren't really treated as a generic "list", but as a collection of nodes--think of the node itself as the iterator for your main loop. So as you're poking through the list you notice as part of your business logic that a new node needs to be added (or an old one deleted) and you do so. You may add 50 nodes in a single iteration and each of those nodes is just O(1) the time to unlink two adjacent nodes and insert your new one.
For purposes of comparing with an array, which is what that chart shows, it's O(1) because you don't have to move all the items after the new node.
So yes, they are assuming that you already have the pointer to that node, or that getting the pointer is trivial. In other words, the problem is stated: "given node at X, what is the code to insert after this node?" You get to start at the insert point.
Insertion into a linked list is different than iterating across it. You aren't locating the item, you are resetting pointers to put the item in there. It doesn't matter if it is going to be inserted near the front end or near the end, the insertion still involves pointers being reassigned. It'll depend on how it was implemented, of course, but that is the strength of lists - you can insert easily. Accessing via index is where an array shines. For a list, however, it'll typically be O(n) to find the nth item. At least that's what I remember from school.
Inserting is O(1) once you know where you're going to put it.
Does this analysis not account for the finding of the node operation (though it is required) and just the insertion itself?
You got it. Insertion at a given point assumes that you already hold a pointer to the item that you want to insert after:
InsertItem(item * newItem, item * afterItem)
No, it does not account for searching. But if you already have hold of a pointer to an item in the middle of the list, inserting at that point is O(1).
If you have to search for it, you'd have to add on the time for searching, which should be O(n).
Because it does not involve any looping.
Inserting is like:
insert element
link to previous
link to next
done
this is constant time in any case.
Consequently, inserting n elements one after the other is O(n).
The most common cases are probably inserting at the begining or at the end of the list (and the ends of the list might take no time to find).
Contrast that with inserting items at the begining or the end of an array (which requires resizing the array if it's at the end, or resizing and moving all the elements if it's at the begining).
The article is about comparing arrays with lists. Finding the insert position for both arrays and lists is O(N), so the article ignores it.
O(1) is depending of that fact that you have a item where you will insert the new item. (before or after). If you donΒ΄t, itΒ΄s O(n) becuase you must find that item.
I think it's just a case of what you choose to count for the O() notation. In the case of inserting the normal operation to count is copy operations. With an array, inserting in the middle involves copying everything above the location up in memory. With a linked list, this becomes setting two pointers. You need to find the location no matter what to insert.
If you have the reference of the node to insert after the operation is O(1) for a linked list.
For an array it is still O(n) since you have to move all consequtive nodes.

Resources