Having More than One Info Part in LinkedList - linked-list

We have been given an assignment of implementing priority queue using linked lists. The logic in my mind is that if I add 2 info parts to the node, one for containing the data to print & other for storing a key to prioritize the node, then I can dequeue the node according to the priority.
Now I am just confused as to whether it is legal to add two info parts to a single node?
Like
private class Node {
private int priority;
private String job;
private Node Next;
}
If it is a doubly linked list then the reverse pointer is also necessary.

It's certainly fine to put two pieces of information in a node in a linked list. In fact, if you're building a priority queue, you will probably need some kind of priority 'key' to order your queue, as well as a 'value', (a.k.a. 'data' or 'payload') that the node is holding onto for later use.
In your case, the String is the value and the int is the key / priority. You can think of this node as having one piece of information (the String), besides its key.
If that's not exactly what you're after, you could make a more flexible linked list that could hold any data in its node, including a single piece of data that contained both an int and a String. This could therefore be used for a priority queue or any other kind of abstract data structure built on a linked list.
Your code looks like Java, so if you'd like to know how to make this more flexible node in Java you can look into Generics in Java.

Related

How to define hash table?

I'm having problem understanding why do we* use node as data-type?
*(I'm doing CS50 and while solving problem sets it's givel like this)
node *hashtable[50];
(here node refers to linked list node)
as we are just storing an pointer for a linked list in it, wouldn't it be better to define it as just an array of char*
char *hashtable[50];
Hashing functions have collisions. When a key hashes to an index where the table is already occupied, one strategy to resolve the collisions have a linked list there and you simply append to it.
There are other collision resolution strategies, but the separate chaining strategy is probably the simplest.
In order to be able to treat the hash table items as linked lists, they need to have at least a next pointer in addition to their payload. Hence the items need to be some kind of struct node* rather than the payload type directly.

LRU cache with a singly linked list

Most LRU cache tutorials emphasize using both a doubly linked list and a dictionary in combination. The dictionary holds both the value and a reference to the corresponding node on the linked list.
When we perform a remove operation, we look up the node from the linked list in the dictionary and we'll have to remove it.
Now here's where it gets weird. Most tutorials argue that we need the preceding node in order to remove the current node from the linked list. This is done in order to get O(1) time.
However, there is a way to remove a node from a singly linked list in O(1) time here. We set the current node's value to the next node and then kill the next node.
My question is: why are all these tutorials that show how to implement an LRU cache with a doubly linked list when we could save constant space by using a singly linked list?
You are correct, the single linked list can be used instead of the double linked list, as can be seen here:
The standard way is a hashmap pointing into a doubly linked list to make delete easy. To do it with a singly linked list without using an O(n) search, have the hashmap point to the preceding node in the linked list (the predecessor of the one you care about, or null if the element is at the front).
Retrieve list node:
hashmap(key) ? hashmap(key)->next : list.head
Delete:
successornode = hashmap(key)->next->next
hashmap( successornode ) = hashmap(key)
hashmap(key)->next = successornode
hashmap.delete(key)
Why is the double linked list so common with LRU solutions then? It is easier to understand and use.
If optimization is an issue, then the trade off of a slightly less simple solution of a single linked list is definitely worth it.
There are a few complications for swapping the payload
The payload could be large (such as buffers)
part of the application code may still refer to the payload (have it pinned)
there may be locks or mutexes involved (which can be owned by both the DLL/hash nodes and/or the payload)
In any case, modifying the DLL affects at most 2*2 pointers, swapping the payload will need (memcpy for the swap +) walking the hash-chain (twice), which could need access to any node in the structure.

In a linked list, why don't we give a name to each node?

I see people usually use a temp node to manipulate a linked list. For example, create a new node whose pointer is stored in temp, point previous block to temp, then use temp for the next node.
Why not keep a designated name to each node(keep a variable that stores its address), so that we can access that node by simply dereferencing its name. This way, we can still insert a new node by pointing the previous node to it and pointing it to the next node.
I know there is a reason why linked list is not made this way, I just can't figure out why.
The linked list data type is simply not made for having a name for each item. In many cases you simply just don't need to name everything. If you need such behavior you can extend the type for your needs.
It all comes down to: Use the data structure which fits to your actual use case.
In Java for example there is a pre-defined type which does exactly, what you have described:
LinkedHashMap<K, V>

Using complex objects in DataFlow

We have several BigQuery tables that we're reading from through DataFlow. At the moment those tables are flattened and a lot of the data is repeated. In Dataflow, all operations must be idempotent, so any output only depends on the input to the function, there's no state kept anywhere else. This is why it makes sense to first group all the records together that belong together and in our case, this probably means creating complex objects.
Example of A complex object (there are many other types like this). We can have millions of instances of each type obviously:
Customer{
customerId
address {
street
zipcode
region
...
}
first_name
last_name
...
contactInfo: {
"phone1": {type, number, ... },
"phone2": {type, number, ... }
}
}
The examples we found for DataFlow only process very simple objects and the examples demonstrate counting, summing and averaging.
In our case, we eventually want to use DataFlow to perform more complicated processing in accordance with sets of rules. Those rules apply to the full contact of a customer, invoice or order for example and eventually produce a whole set of indicators, sums and other items.
We considered doing this 100% in BigQuery, but this gets very messy very quickly due to the rules that apply per entity.
At this time I'm still wondering whether DataFlow is really the right tool for this job. There are almost no examples for dataFlow that demonstrate how it's used for these type of more complex objects with one or two collections. The closest I found was the use of a "LogMessage" object for log processing, but this didn't have any collections and therefore didn't do any hierarchical processing.
The biggest problem we're facing is hierarchical processing. We're reading data like this:
customerid ... street zipcode region ... phoneid type number
1 a b c phone1 1 555-2424
1 a b c phone2 1 555-8181
And the first operation should be group those rows together to construct a single entity, so we can make our operations idempotent. What is the best way to do that in DataFlow, or point us to an example that does that?
You can use any object as the elements in a Dataflow pipeline. The TrafficMaxLaneFlow example uses a complex object (although it doesn't have a collection).
In your example you would do a GroupByKey to group the elements. The result is a KV<K, Iterable<V>>. The KV here is just an object and has a collection-like value inside. You could then take that KV<K, Iterable<V>> and turn it into whatever kind of objects you wanted.
The only thing to be aware of is that if you have very few elements that are really big you may run into some parallelism limits. Specifically, each element needs to be small enough to be processed on a single machine.
You may also be interested in withoutFlatteningResults on BigQueryIO. It only supports reading from a query (rather than a table) but it should provide the results without flattening.

Data management in the node of a double linked list

I am implementing a double list library. i have a data element in each node. I have a function
InitList(ListPtr) which takes in the listPtr passed and initializes the first and the last elements and sets data to 1 and 2 respectively.
Now, if i append a node, i make the data in the node to be appended as 3 and make it last
I was thinking if of a function Insert(ListPtr, node). This node would have some number say, 4 and let us say the list already has 10 nodes. I insert the node at 4th position and make the data of the remaining nodes till the last +1.
My question is,what if i have 100 nodes in the list, each time i do a insert i will be doing data management.
Is it supposed to be done at all. i.e do i not need to care about data at all? I helped me during the initial development process but now it seems like it is not necessary.
Please let me know your thoughts
//Each node
typedef struct Node
{
int data;
node_t *next;
node_t *prev;
}node_t;
//List always begins with first and last nodes
typedef struct List
{
node_t *first; // Pointer to first node in List
node_t *last; // Pointer to last node in List
}list_t;
Your question seams a bit confusing/non-explanatory. What kind of data do you have? Why do you want to keep digits in that data. As far as I got to know is that you want to sort your data which is position sensitive. First of all I don't think that you need a double linked list. A simple linked list would that for you. You may also have an option of using dynamic arrays instead. But if you really want to implement a linked list then you should not make it complex by introducing a double linked list as they are difficult to manage.
I would be able to answer better if you tell me about the nature of your data and what do you want to implement. To me there is something definitely wrong with the logic.

Resources