How to transform out-adjacency list to in-adajcency list? - graph-algorithm

For directed graphs G = (V,E). That representation maintains an array A[...] indexed by V , in which A[v] is a linked list. The linked list holds the names of all the nodes u to which v points, i.e., nodes u for which (v, u) ∈ E. (Technically, A[v] contains a pointer to the first item in the linked list).This is the default adjacency list format and can be thought of as an out- adjacency list representation
An in-adjacency list representation would be one in which A[v] is the list of nodes that point to v.
Can anyone help to give me a pseudocode for an O(|V | + |E|) algorithm that transforms the out-adjacency list representation into an in-adjacency list repre- sentation. And please explain why your algorithm is correct and why it runs in O(|V | + |E|) time.

You could do that in O(V + E) but for that you have to modify the insert operation in linked list to be done in constant time O(1).
That could be easily done, by keeping a separate pointer last, where last points to the last element inserted in a linked list. With help of last, insert operation in a linked list can be done in O(1),as opposed to usual O(N).
Now coming to the problem, lets say our new adjacency list is adj_new. We start by traversing our original adjacency list starting from the first linked list.
For each element x of linked list A[0], we do insert operation:
insert 0 in linked list adj_new[x]
We do the above for each of the linked lists. Since traversing the entire adjacency list takes time
O(V + E), and every insert operation takes O(1) time, the total time taken is O(V + E)
Following is the pseudocode :
For each linked list A[i]
for each element x of A[i]
append i to linked list adj_new[x]
adj_new[] is the in-adjacency list. If you look carefully its nothing but reversing the direction of each edge of your directed graph.


Compute the distances between two nodes and their lowest common ancestor (LCA)

I need to compute the distance that separate two nodes A and B with their lowest common ancestor in a graph. I use the followinf function to find LCA:
match p1 = (A:Category {idCat: "Main_topic") -[*0..]-> (common:Category) <-[*0..]- (B:Category {idCat: "Heat_transfer"})
return common, p1
Is there any function in Neo4j that allows to return the respective distance between d(A,common) and d(B, common).
Thank you fo your help
If I understand the lowest common ancestor correctly, this comes down to finding the shortest path between A and B with at least one node in between. That you can do using this query. Here the condition that the length of p is larger than 1 forces at least one node between the two. Below example uses the IMDB toy database and returns the movie Avatar.
match p=shortestPath((n:Person {name:'Zoe Saldana'})-[r*1..15]-(n1:Person {name:'James Cameron'})) where length(p) > 1 return nodes(p)[1]
Basically you can choose any element from the nodes in the path, except the first and last one (since those will be A and B)

Finding all subtrees of nodes with a common second level relationship

I am working with bill of materials (BOM) and part data in a Neo4J database.
There are 3 types of nodes in my graph:
(ItemUsageInstance) these are the elements of the bill of materials tree
(Item) one exists for each unique item on the BOM tree
The relationships are:
The schema is pictured below:
Here is a simplified picture of the data. (Diagram with nodes repositioned to enhance visibility):
What I would like to do is find subtrees of adjacent ItemUsageInstances whose Itemss are all made from the same Materials
The query I have so far is:
MATCH (m:Material)
MATCH (m)<-[:MADE_FROM]-(i1:Item)<-[]-(iui1:ItemUsageInstance)-[:CHILD_OF]->(iui2:ItemUsageInstance)-[]->(i2:Item)-[:MADE_FROM]->(m) RETURN iui1, i1, iui2, i2, m
However, this only returns one such subtree, the adjacent nodes in the middle of the graph that have a common Material of "M0002". Also, the rows of the results are separate entries, one for each parent-child pair in the subtree:
│"iui1" │"i1" │"iui2" │"i2" │"m" │
I was expecting a second subtree, which happens to also be a linked list, to be included. This second subtree consists of ItemUsageInstances inst7006, inst7007, inst7008 at the far right of the graph. For what it's worth, not only are these adjacent instances made from the same Material, they are all instances of the same Item.
I confirmed that every ItemUsageInstance node has an [INSTANCE_OF] relationship to an Item node:
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)-[:INSTANCE_OF]->(:Item) RETURN iui
(returns 0 records).
Also confirmed that every Item node has a [MADE_FROM] relationship to a Material node:
MATCH (i:Item) WHERE NOT (i)-[:MADE_FROM]->(:Material) RETURN i
(returns 0 records).
Confirmed that inst7008 is the only ItemUsageInstance without an outgoing [CHILD_OF] relationship.
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)-[:CHILD_OF]->(:ItemUsageInstance) RETURN iui
(returns 1 record: {"instance_id":"inst7008"})
inst5000 and inst7001 are the only ItemUsageInstances without an incoming [CHILD_OF] relationship
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)<-[:CHILD_OF]-(:ItemUsageInstance) RETURN iui
(returns 2 records: {"instance_id":"inst7001"} and {"instance_id":"inst5000"})
I'd like to collect/aggregate the results so that each row is a subtree. I saw this example of how to collect() and got the array method to work. But it still has duplicate ItemUsageInstances in it. (The "map of items" discussed there failed completely...)
Any insights as to why my query is only finding one subtree of adjacent item usage instances with the same material?
What is the best way to aggregate the results by subtree?
Finding the roots is easy. MATCH (root:ItemUsageInstance) WHERE NOT ()-[:CHILD_OF]->(root)
And for the children, you can include the root by specifying a min distance of 0 (default is 1).
MATCH p=(root)-[:CHILD_OF*0..25]->(ins), (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(ins)
And then assuming only one item-material per instance, aggregate everything based on material (You can't aggregate in an aggregate, so use WITH to get the depth before collecting the depth with the node)
WITH ins, SIZE(NODES(p)) as depth, m RETURN COLLECT({node:ins, depth:depth}) as instances, m as material
So, all together
MATCH (root:ItemUsageInstance),
WHERE NOT ()<-[:CHILD_OF]-(root)
AND NOT (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-()<-[:CHILD_OF]-(ins)
MATCH p2=(ins)<-[:CHILD_OF*1..25]-(cins)
WHERE ALL(n in NODES(p2) WHERE (m)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(n))
WITH ins, cins, SIZE(NODES(p2)) as depth, m ORDER BY depth ASC
RETURN ins as collection_head, ins+COLLECT(cins) as instances, m as material
In your pattern, you don't account for situations like the link between inst_5001 and inst_7001. Inst_5001 doesn't have any links to any part usages, but your match pattern requires that both usages have such a link. I think this is where you're going off track. The inst_5002 tree you're finding because it happens to have a link to an usage as your pattern requires.
In terms of "aggregating by subtree", I would return the ID of the root of the tree (e.g. id(iui1) and then count(*) the rest, to show how many subtrees a given root participates in.
Here is my heavily edited query:
MATCH path = (cinst:ItemUsageInstance)-[:CHILD_OF*1..]->(pinst:ItemUsageInstance), (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(pinst)
WHERE ID(cinst) <> ID(pinst) AND ALL (x in nodes(path) WHERE ((x)-[:INSTANCE_OF]->(:Item)-[:MADE_FROM]->(m)))
WITH nodes(path) as insts, m
UNWIND insts AS instance
WITH DISTINCT instance, m
RETURN collect(instance), m
It returns what I was expecting:
│"collect(instance)" │"m" │
│[{"instance_id":"inst7007"},{"instance_id":"inst7008"},{"instance_id":"inst7006"}] │{"material_id":"M0001"}│
The one limitation is that it does not distinguish the root of the subtree from the children. Ideally the list of {"instance_id"} would be sorted by depth in the tree.

Query node properties into lists, related node properties into list of lists, tracking hierarchy

Given the following schema / data set:
I'm trying to assemble a query for some a properties, a list of b properties (list of strings), and finally a list of c property lists (list of list of strings). I'm pretty close using collect() but running into an issue keeping track of which c's belong to which b's.
The query I seek would produce a single row per a (2 rows for given data set), notice that the data can be sparse, thus an empty array in the results indicating hierarchy:
"A1", ["B1", "B2"], [["C1","C2"],["C3"]]
"A2", ["B3", "B4"], [[],["C4"]]
When you aggregate using COLLECT [or any other aggregation], the other, uncollected [unaggregated] values in the row serve as the aggregation key, so only rows that share all the other values will match. For your query, you basically need to stack COLLECTs in two separate steps like so, to first get lists of c keyed by a and b, and then collect all of the bs and all of the lists of cs keyed by a, like so:
MATCH (a) - [:ONE] -> (b)
OPTIONAL MATCH (b) - [:TWO] -> (c)
WITH a, COLLECT( AS bs, COLLECT(cs) AS cs_per_b
RETURN, bs, cs_per_b
You can replace property with whatever property you want to get from the node, and if it's not a node property, but label or other value, just replace the whole expression inside COLLECT( ). You'll also get empty lists inside cs_per_b this way if there are no cs, as desired.
Although your question states you want to list node "properties", your sample results list node labels instead.
To display the node labels, the following query should work:
MATCH (a)-[:ONE]->(b)
The query assumes that it is sufficient to use the ONE and TWO relationship types to distinguish between the Ax, Bx and Cx node labels, and that those nodes only have a single label. It uses an OPTIONAL MATCH for the TWO relationship since your sample results imply that it is optional.

Pairs from a directed acyclic Neo4j graph

I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
D -> 1
2 -> X
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.
Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node ("Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.

Time complexity of node deletion in singly- and doubly-linked lists

Why is the time complexity of node deletion in doubly linked lists (O(1)) faster than node deletion in singly linked lists (O(n))?
The problem assumes that the node to be deleted is known and a pointer to that node is available.
In order to delete a node and connect the previous and the next node together, you need to know their pointers. In a doubly-linked list, both pointers are available in the node that is to be deleted. The time complexity is constant in this case, i.e., O(1).
Whereas in a singly-linked list, the pointer to the previous node is unknown and can be found only by traversing the list from head until it reaches the node that has a next node pointer to the node that is to be deleted. The time complexity in this case is O(n).
In cases where the node to be deleted is known only by value, the list has to be searched and the time complexity becomes O(n) in both singly- and doubly-linked lists.
Actually deletion in singly linked lists can also be implemented in O(1).
Given a singly linked list with the following state:
Node 1 -> Node 2
Node 2 -> Node 3
Node 3 -> Node 4
Node 4 -> None
Head = Node 1
We can implement delete Node 2 in such a way:
Node 2 Value <- Node 3 Value
Node 2 -> Node 4
Here we replace the value of Node 2 with the value of its next node (Node 3) and set its next value pointer to the next value pointer of Node 3 (Node 4), skipping over the now effectively "duplicate" Node 3. Thus no traversal needed.
Because you can't look backwards...
Insertion and deletion at a known position is O(1). However, finding that position is O(n), unless it is the head or tail of the list.
When we talk about insertion and deletion complexity, we generally assume we already know where that's going to occur.
It has to do with the complexity of fixing up the next pointer in the node previous to the one you're deleting.
Unless the element to be deleted is the head(or first) node, we need to traverse to the node before the one to be deleted. Hence, in worst case, i.e., when we need to delete the last node, the pointer has to go all the way to the second last node thereby traversing (n-1) positions, which gives us a time complexity of O(n).
I don't think Its O(1) unless you know the address of the
node whichh has to be deleted ..... Don't you loop to reach the node which has to be deleted from head ????
It is O(1) provided you have the address of the node which has to be deleted because you have it's prev node link and next node link .
As you have all the necessary links available just make the "node of interest " out of the list by re arranging the links and then free() it .
But in a single linked list you have to traverse from head to get it's previous and next address doesn't matter whether you have the address to f the node to be deleted or the node position ( as in 1st ,2nd ,10th etc.,.) To be deleted .
Suppose there is a linked list from 1 to 10 and you have to delete node 5 whose location is given to you.
1 -> 2 -> 3 -> 4 -> 5-> 6-> 7-> 8 -> 9 -> 10
You will have to connect the next pointer of 4 to 6 in order to delete 5.
Doubly Linked list
You can use the previous pointer on 5 to go to 4. Then you can do
4->next = 5->next;
Node* temp = givenNode->prev;
temp->next = givenNode->next;
Time Complexity = O(1)
singly Linked List
Since you don't have a previous pointer in Singly linked list you cant go backwards so you will have to traverse the list from head
Node* temp = head;
while(temp->next != givenNode)
temp = temp->next;
temp->next = givenNode->next;
Time Complexity = O(N)
In LRU cache design, deletion in doubly linked list takes O(1) time. LRU cache is implemented with hash map and doubly linked list. In the doubly linked list, we store the values and it hash maps we store the pointers of linked list nodes.
In case of a cache hit, we have to move the element to the front of the list. If the node is somewhere in the middle of doubly linked list, since we keep the pointers in the hash map and we retrieved in O(1) time, we can delete it by
then set the pointers to None
and then you can reconnect the missing parts of the linked list
