Merge sorting with linked chains in java - linked-list

I am looking for a template of sorts for merging two linked chains that have already been sorted. I'm still fairly new to Java, and this seems to be a pretty challenging task to accomplish with the limited knowledge I have. I have an understanding of how to merge sort an array, but when it comes to linked lists I seem to be drawing blanks. Any help you all could give me, be it actual code or simply advise on where to start, would be greatly appreciated.
Thank you for your time!

If the two linked list are already sorted, then it is so easy to merge those two together. I am gonna tell you the algorithm but you need to write the code yourself since it seems like a school project. First you make a new linked list, and then assign the head of the new list to be the min of list1Head and list2Head, then you just walk the two list, each time picking the min of the current node of the two list and append to the new created list, make the current to be .Next if it got picked. If one of the list doesn't have more nodes, then append the rest of another list directly to the new list. Done

Can't you look at the first element in each list and take the smallest. This is the start of the new list. Remove this from the front ofwhichever list it came from. Now look at the first element again and take the smallest and make it the second element in the new list. Then just repeat this process zipping the two lists together.
If you want to avoid creating a new list the just find the smallest then look at the thing is pointing at and the beginning of the other list and see which is smaller. If you are not already pointing at the smaller one the update the pointer so it is. Then rinse and repeat.

Related

How can I one-hot encode the data which has multiple same values for different properties?

I have data containing candidates who look for a job. The original data I got was a complete mess but I managed to enhance it. Now, I am facing an issue which I am not able to resolve.
One candidate record looks like
https://i.imgur.com/LAPAIbX.png
Since ML algorithms cannot work with categorical data, I want to encode this. My goal is to have a candidate record looking like this:
https://i.imgur.com/zzsiDzy.png
What I need to change is to add a new column for each possible value that exists in Knowledge1, Knowledge2, Knowledge3, Knowledge4, Tag1, and Tag2 of original data, but without repetition. I managed to encode it to get way more attributes than I need, which results in an inaccurate model. The way I tried gives me newly created attributes Jscript_Knowledge1, Jscript_Knowledge2, Jscript_Knowledge3 and so on, for each possible option.
If the explanation is not clear enough please let me know so that I could explain it further.
Thanks and any help is highly appreciated.
Cheers!
I have some understanding of your problem based on your explanation. I will try and elaborate how I would approach this problem. If that is not solving your problem, I may need more explanation to understand your problem. Lets get started.
For all the candidate data that you would have, collect a master
skill/knowledge list
This list becomes your columns
For each candidate, if he has this skill, the column becomes 1 for his record else it stays 0
This is the essence of one hot encoding, however, since same skill is scattered across multiple columns you are struggling with autoencoding it.
An alternative approach could be:
For each candidate collect all the knowledge skills as list and assign it into 1 column for knowledge and tags as another list and assign it to another column instead of current 4(Knowledge) + 2 (tags).
Sort the knowledge(and tag) list alphabetically within this column.
Auto One hot encoding after this may yield smaller columns than earlier
Hope this helps!

vtd xml diff implementation

I have many VTD+XML indexes for different versions of the same file that i am hoping to implement a diff-like method to return the x-paths of nodes that have been modified between versions, as well as the difference between text within those nodes.
I figure using an existing algorithm such as O(nd) difference would be best to compare the text within two nodes. Thus the approach i envisioned would be to traverse the two documents simultaneously and store the xpath that corresponds with any nodes that contain text variations.
The issue is that once i encounter new or removed nodes, how do i determine that the node is infact an inserted/removed node or a variation of an existing node?
Or maybe there is another approach i should be taking?
Maybe my interpretation of your question is not exactly on the mark. But I feel that what you are trying to do may not have easy answers... consider the following XML snippet
<a>
<b>text1</b>
<b>text1</b>
</a>
and
<a>
<b>text2</b>
<b>text1</b>
</a>
You could say the second XML is simply the first one with text2 replaced with text1.
But you could also say the second XML is simply the first one removing the first b node, changing text1 of the the second b node to text2, and then insert text1 after the second b node.
In summary, it seems you don't just want to know what are the difference, but also the changes that lead to those differences. This is difficult as there are different things you can do that leads to the same output.

single vs double Linked list

There is a table I found below
My question is whether or not it is true that a single and double linked list have the same operation run times like the table seems to show. I would think in the deletion case for example, a double linked list would be better since we have access to previous. So is the table wrong on that being O(n) for singly linked lists?
If they are all the same, does this similarity hold for a circular one as well?
Thanks.
Here is my answer to your question:
No matter whether the double linked list enable you have access to previous or not, it doesn't affect the time complexity we calculate in terms of Big O notation, I think it does give you some convenience though.
Yes, they are all the same, and the similarity holds for a circular one as well.

Best Possible algorithm to check if two linked lists are merging at any point? If so, where? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Linked list interview question
This is an interview question for which I don't have an answer.
Given Two lists, You cannot change list and you dont know the length.
Give best possible algorithm to:
Check if two lists are merging at any point?
If merging, at what point they are merging?
If I allow you to change the list how would you modify your algorithm?
I'm assuming that we are talking about simple linked lists and we can safely create a hash table of the list element pointers.
Q1: Iterate to end of both lists, If the respective last elements are the same, the lists merge at some point.
Complexity - O(N), space complexity - O(1)
Q2:
Put all elements of one list into a hash table
Iterate over 2nd list, probing the hash table for each element of the list. The first hit (if any) is the merge point, and we have the position in the 2nd list.
To get the position in the 1st list, iterate over the first list again looking for the element found in the previous step.
Time complexity - O(N). Space complexity - O(N)
Q3:
As Q1, but also reverse the direction of the list pointers.
Then iterate the reversed lists looking for the last common element - that is the merge point - and restoring the list to the original order.
Time complexity - O(N). Space complexity - O(1)
Number 1: Just iterate both and then check if they end with the same element. Thats O(n) and it cant be beaten (as it might possibly be the last element that is common, and getting there always takes O(n)).
Walk those two lists parallel by one element, add each element to Set of visited nodes (can be hash map, or simple set, you only need to check if you visited that node before). At each step check if you visited that node (if yes, then it's merging point), and add it to set of nodes if you visit it first time. Another version (as pointed by #reinier) is to walk only first list, store its nodes in Set and then only check second list against that Set. First approach is faster when your lists merge early, as you don't need to store all nodes from first list. Second is better at worst case, where both list don't merge at all, since it didn't store nodes from second list in Set
see 1.
Instead of Set, you can try to mark each node, but if you cannot modify structure, then it's not so helpful. You could also try unlink each visited node and link it to some guard node (which you check at each step if you encountered it while traversing). It saves memory for Set if list is long enough.
Traverse both the list and have a global variable for finding the number of NULL encountered . If they merge at some point there will be only 1 NULL else there will be two NULL.

Why is inserting in the middle of a linked list O(1)?

According to the Wikipedia article on linked lists, inserting in the middle of a linked list is considered O(1). I would think it would be O(n). Wouldn't you need to locate the node which could be near the end of the list?
Does this analysis not account for the finding of the node operation (though it is required) and just the insertion itself?
EDIT:
Linked lists have several advantages over arrays. Insertion of an element at a specific point of a list is a constant-time operation, whereas insertion in an array may require moving half of the elements, or more.
The above statement is a little misleading to me. Correct me if I'm wrong, but I think the conclusion should be:
Arrays:
Finding the point of insertion/deletion O(1)
Performing the insertion/deletion O(n)
Linked Lists:
Finding the point of insertion/deletion O(n)
Performing the insertion/deletion O(1)
I think the only time you wouldn't have to find the position is if you kept some sort of pointer to it (as with the head and the tail in some cases). So we can't flatly say that linked lists always beat arrays for insert/delete options.
You are correct, the article considers "Indexing" as a separate operation. So insertion is itself O(1), but getting to that middle node is O(n).
The insertion itself is O(1). Node finding is O(n).
No, when you decide that you want to insert, it's assumed you are already in the middle of iterating through the list.
Operations on Linked Lists are often done in such a way that they aren't really treated as a generic "list", but as a collection of nodes--think of the node itself as the iterator for your main loop. So as you're poking through the list you notice as part of your business logic that a new node needs to be added (or an old one deleted) and you do so. You may add 50 nodes in a single iteration and each of those nodes is just O(1) the time to unlink two adjacent nodes and insert your new one.
For purposes of comparing with an array, which is what that chart shows, it's O(1) because you don't have to move all the items after the new node.
So yes, they are assuming that you already have the pointer to that node, or that getting the pointer is trivial. In other words, the problem is stated: "given node at X, what is the code to insert after this node?" You get to start at the insert point.
Insertion into a linked list is different than iterating across it. You aren't locating the item, you are resetting pointers to put the item in there. It doesn't matter if it is going to be inserted near the front end or near the end, the insertion still involves pointers being reassigned. It'll depend on how it was implemented, of course, but that is the strength of lists - you can insert easily. Accessing via index is where an array shines. For a list, however, it'll typically be O(n) to find the nth item. At least that's what I remember from school.
Inserting is O(1) once you know where you're going to put it.
Does this analysis not account for the finding of the node operation (though it is required) and just the insertion itself?
You got it. Insertion at a given point assumes that you already hold a pointer to the item that you want to insert after:
InsertItem(item * newItem, item * afterItem)
No, it does not account for searching. But if you already have hold of a pointer to an item in the middle of the list, inserting at that point is O(1).
If you have to search for it, you'd have to add on the time for searching, which should be O(n).
Because it does not involve any looping.
Inserting is like:
insert element
link to previous
link to next
done
this is constant time in any case.
Consequently, inserting n elements one after the other is O(n).
The most common cases are probably inserting at the begining or at the end of the list (and the ends of the list might take no time to find).
Contrast that with inserting items at the begining or the end of an array (which requires resizing the array if it's at the end, or resizing and moving all the elements if it's at the begining).
The article is about comparing arrays with lists. Finding the insert position for both arrays and lists is O(N), so the article ignores it.
O(1) is depending of that fact that you have a item where you will insert the new item. (before or after). If you don´t, it´s O(n) becuase you must find that item.
I think it's just a case of what you choose to count for the O() notation. In the case of inserting the normal operation to count is copy operations. With an array, inserting in the middle involves copying everything above the location up in memory. With a linked list, this becomes setting two pointers. You need to find the location no matter what to insert.
If you have the reference of the node to insert after the operation is O(1) for a linked list.
For an array it is still O(n) since you have to move all consequtive nodes.

Resources