Finding all subtrees of nodes with a common second level relationship

Finding all subtrees of nodes with a common second level relationship - neo4j

I am working with bill of materials (BOM) and part data in a Neo4J database.
There are 3 types of nodes in my graph:
(ItemUsageInstance) these are the elements of the bill of materials tree
(Item) one exists for each unique item on the BOM tree
(Material)
The relationships are:
(ItemUsageInstance)-[CHILD_OF]->(ItemUsageInstance)
(ItemUsageInstance)-[INSTANCE_OF]->(Item)
(Item)-[MADE_FROM]->(Material)
The schema is pictured below:
Here is a simplified picture of the data. (Diagram with nodes repositioned to enhance visibility):
What I would like to do is find subtrees of adjacent ItemUsageInstances whose Itemss are all made from the same Materials
The query I have so far is:
MATCH (m:Material)
WITH m AS m
MATCH (m)<-[:MADE_FROM]-(i1:Item)<-[]-(iui1:ItemUsageInstance)-[:CHILD_OF]->(iui2:ItemUsageInstance)-[]->(i2:Item)-[:MADE_FROM]->(m) RETURN iui1, i1, iui2, i2, m
However, this only returns one such subtree, the adjacent nodes in the middle of the graph that have a common Material of "M0002". Also, the rows of the results are separate entries, one for each parent-child pair in the subtree:
╒══════════════════════════╤══════════════════════╤══════════════════════════╤══════════════════════╤═══════════════════════╕
│"iui1" │"i1" │"iui2" │"i2" │"m" │
╞══════════════════════════╪══════════════════════╪══════════════════════════╪══════════════════════╪═══════════════════════╡
│{"instance_id":"inst5002"}│{"part_number":"p003"}│{"instance_id":"inst7003"}│{"part_number":"p004"}│{"material_id":"M0002"}│
├──────────────────────────┼──────────────────────┼──────────────────────────┼──────────────────────┼───────────────────────┤
│{"instance_id":"inst7002"}│{"part_number":"p003"}│{"instance_id":"inst7003"}│{"part_number":"p004"}│{"material_id":"M0002"}│
├──────────────────────────┼──────────────────────┼──────────────────────────┼──────────────────────┼───────────────────────┤
│{"instance_id":"inst7001"}│{"part_number":"p002"}│{"instance_id":"inst7002"}│{"part_number":"p003"}│{"material_id":"M0002"}│
└──────────────────────────┴──────────────────────┴──────────────────────────┴──────────────────────┴───────────────────────┘
I was expecting a second subtree, which happens to also be a linked list, to be included. This second subtree consists of ItemUsageInstances inst7006, inst7007, inst7008 at the far right of the graph. For what it's worth, not only are these adjacent instances made from the same Material, they are all instances of the same Item.
I confirmed that every ItemUsageInstance node has an [INSTANCE_OF] relationship to an Item node:
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)-[:INSTANCE_OF]->(:Item) RETURN iui
(returns 0 records).
Also confirmed that every Item node has a [MADE_FROM] relationship to a Material node:
MATCH (i:Item) WHERE NOT (i)-[:MADE_FROM]->(:Material) RETURN i
(returns 0 records).
Confirmed that inst7008 is the only ItemUsageInstance without an outgoing [CHILD_OF] relationship.
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)-[:CHILD_OF]->(:ItemUsageInstance) RETURN iui
(returns 1 record: {"instance_id":"inst7008"})
inst5000 and inst7001 are the only ItemUsageInstances without an incoming [CHILD_OF] relationship
MATCH (iui:ItemUsageInstance) WHERE NOT (iui)<-[:CHILD_OF]-(:ItemUsageInstance) RETURN iui
(returns 2 records: {"instance_id":"inst7001"} and {"instance_id":"inst5000"})
I'd like to collect/aggregate the results so that each row is a subtree. I saw this example of how to collect() and got the array method to work. But it still has duplicate ItemUsageInstances in it. (The "map of items" discussed there failed completely...)
Any insights as to why my query is only finding one subtree of adjacent item usage instances with the same material?
What is the best way to aggregate the results by subtree?

Finding the roots is easy. MATCH (root:ItemUsageInstance) WHERE NOT ()-[:CHILD_OF]->(root)
And for the children, you can include the root by specifying a min distance of 0 (default is 1).
MATCH p=(root)-[:CHILD_OF*0..25]->(ins), (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(ins)
And then assuming only one item-material per instance, aggregate everything based on material (You can't aggregate in an aggregate, so use WITH to get the depth before collecting the depth with the node)
WITH ins, SIZE(NODES(p)) as depth, m RETURN COLLECT({node:ins, depth:depth}) as instances, m as material
So, all together
MATCH (root:ItemUsageInstance),
p=(root)<-[:CHILD_OF*0..25]-(ins),
(m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(ins)
WHERE NOT ()<-[:CHILD_OF]-(root)
AND NOT (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-()<-[:CHILD_OF]-(ins)
MATCH p2=(ins)<-[:CHILD_OF*1..25]-(cins)
WHERE ALL(n in NODES(p2) WHERE (m)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(n))
WITH ins, cins, SIZE(NODES(p2)) as depth, m ORDER BY depth ASC
RETURN ins as collection_head, ins+COLLECT(cins) as instances, m as material

In your pattern, you don't account for situations like the link between inst_5001 and inst_7001. Inst_5001 doesn't have any links to any part usages, but your match pattern requires that both usages have such a link. I think this is where you're going off track. The inst_5002 tree you're finding because it happens to have a link to an usage as your pattern requires.
In terms of "aggregating by subtree", I would return the ID of the root of the tree (e.g. id(iui1) and then count(*) the rest, to show how many subtrees a given root participates in.

Here is my heavily edited query:
MATCH path = (cinst:ItemUsageInstance)-[:CHILD_OF*1..]->(pinst:ItemUsageInstance), (m:Material)<-[:MADE_FROM]-(:Item)<-[:INSTANCE_OF]-(pinst)
WHERE ID(cinst) <> ID(pinst) AND ALL (x in nodes(path) WHERE ((x)-[:INSTANCE_OF]->(:Item)-[:MADE_FROM]->(m)))
WITH nodes(path) as insts, m
UNWIND insts AS instance
WITH DISTINCT instance, m
RETURN collect(instance), m
It returns what I was expecting:
╒═════════════════════════════════════════════════════════════════════════════════════════════════════════════╤═══════════════════════╕
│"collect(instance)" │"m" │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════╪═══════════════════════╡
│[{"instance_id":"inst7002"},{"instance_id":"inst7003"},{"instance_id":"inst7001"},{"instance_id":"inst5002"}]│{"material_id":"M0002"}│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────┤
│[{"instance_id":"inst7007"},{"instance_id":"inst7008"},{"instance_id":"inst7006"}] │{"material_id":"M0001"}│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────┘
The one limitation is that it does not distinguish the root of the subtree from the children. Ideally the list of {"instance_id"} would be sorted by depth in the tree.

Related

Leaf nodes and paths in cypher

I have a graph in Neo4J that looks like this:
(a {flag:any})<- (many, 0 or more) <-(b {flag:true})<- (many, 0 or more) <-(c {flag: any})
-OR-
(a {flag:any})<- (many, 0 or more) <-(d)
-OR-
(a {flag:any})
Where a, b, c, and d all have the same type, and the relations are also the same. All the nodes have flag:false except where noted. Of course the real graph is a tree, not a vine.
In short, every path should begin with a and end with the first flag=true node, or should begin with a and get all children down to the leaf of the tree. Per the last example, a doesn't have to have any children - it can be a root and a leaf. Finally, in the first case, we'll never pull in c. b stops the traversal.
How can I write this query?
I have gotten it to work with a path and several unwind/collect statements that are basically horse****, lol. I want a better query, but I am so confused now it is not going to happen.

The following query should return all 3 kinds of paths. I assume that all relevant nodes are labeled Foo, and all relevant relationships have the BAR type.
The first term of the WHERE clause looks for paths (of length 0 or more, because of the variable-length relationship pattern used in the MATCH clasue) that end in a node with a true flag with no true flags earlier in the path (except for possibly the starting node). The second term looks for paths (of length 0 or more) ending with a leaf node, where no nodes (except for possibly the starting node) have a true flag.
MATCH p=(a:Foo)<-[:BAR*0..]-(b:Foo)
WHERE
(b.flag AND NONE(x IN NODES(p)[1..-1] WHERE x.flag)) OR
((NOT (b)<-[:BAR]-()) AND NONE(y IN NODES(p)[1..] WHERE y.flag))
RETURN p;
NOTE: Variable-length relationship patterns with no upper bound (like [:BAR*0..]) can be very expensive, and can take a very long time or cause an out of memory error. So, you may need to specify a reasonable upper bound (for example, [:BAR*0..5]).

I would approach this query as the UNION of the two cases:
MATCH shortestPath((a)<-[:REL_TYPE*1..]-(end:Label {flag: true}))
RETURN a, end
UNION
MATCH (a)<-[:REL_TYPE*0..]-(end:Label)
WHERE NOT (end)<-[:REL_TYPE]-()
RETURN a, end
Let's break it down:
To express that we only want to traverse until the first flag is true, we use shortestPath.
To express that we want to traverse down to the leaf, we use the following formalisation: a node is a leaf if it has no relationships that could be continued, captured by a WHERE NOT filter on patterns.
This should give an idea of the basic ideas to use for such queries -- please provide some feedback so that I can refine the answer.

neo4j - restrict query based on node's rank

I have a hierarchical structure of nodes, which all have a custom-assigned sorting property (numeric). Here's a simple Cypher query to recreate:
merge (p {my_id: 1})-[:HAS_CHILD]->(c1 { my_id: 11, sort: 100})
merge (p)-[:HAS_CHILD]->(c2 { my_id: 12, sort: 200 })
merge (p)-[:HAS_CHILD]->(c3 { my_id: 13, sort: 300 })
merge (c1)-[:HAS_CHILD]->(cc1 { my_id: 111 })
merge (c2)-[:HAS_CHILD]->(cc2 { my_id: 121 })
merge (c3)-[:HAS_CHILD]->(cc3 { my_id: 131 });
The problem I'm struggling with is that often I need to make decisions based on child node rank relative to some parent node, with regads to this sort identifier. So, for example, node c1 has rank 1 relative to node p (because it has the least sort property), c2 has rank 2, and c3 has rank 3 (the biggest sort).
The kind of decision I need to make based to this information: display children only of the first 2 cX nodes. Here's what I want to get:
cc1 and cc2 are present, but cc3 is not because c3 (its parent) is not the first or the second child of p. Here's a dumb query for that:
match (p {my_id: 1 })-->(c)
optional match (c)-->(cc) where c.sort <= 200
return p, c, cc
The problem is, these sort properties are custom-set and imported, so I have no way of knowing which value will be held for child number 2.
My current solution is to rank it during import, and since I'm using Oracle, that's quite simple -- I just need to use rank window function. But it seems awkward to me and I feel like there could be more elegant solution to that. I tried the next query and it works, but it looks weird and it's quite slow on bigger graphs:
match (p {my_id: 1 })-->(c)
optional match (c)-->(cc)
where size([ (p)-->(c1) where c1.sort < c.sort |c1]) < 2
return p, c, cc
Here's the plan for this query and the most expensive part is in fact the size expression:

The slowness you're seeing is likely because you're not performing an index lookup in your query, so it's performing an all nodes scan and accessing the my_id property of every node in your graph to find the one with id 1 (your p node).
You need to add labels on your nodes and use these labels in your queries (at least for your p node), and create an index (or in this case, probably a unique constraint) on the label for my_id so this lookup becomes fast.
You can confirm what's going on by doing a PROFILE of your query (if you can add the profile plan to your description, with all elements of the plan expanded that would help determine further optimizations).
As for your query, something like this should work (I'm using a :Node label as a standin for your actual label)
match (p:Node {my_id: 1 })-->(c)
with p, c
order by c.sort asc
with p, collect(c) as children // children are in order
unwind children[..2] as child // one row for each of the first 2 children
optional match (child)-->(cc) // only matched for the first 2 children
return p, children, collect(cc) as grandchildren
Note that this only returns nodes, not paths or relationships. The reason why you're getting the result graph in the graphical view is because, in the Browser Setting tab (the gear icon in the lower left menu) you have Connect result nodes checked at the bottom.

Is this the optimal cypher for returning every node of a subtree?

I have a graph of a tree structure (well no, more of a DAG because i can have multiple parents) and need to be able to write queries that return all results in a flat list, starting at a particular node(s) and down.
I've reduced one of my use cases to this simple example. In the ascii representation here, n's are my nodes and I've appended their id. p is a permission in my auth system, but all that is pertinent to the question is that it marks the spot from which I need to recurse downwards to collect nodes which should be returned by the query.
There can be multiple root nodes related to p's
The roots, such as n3 below, should be contained in the results, as well as the children
The relationship depth is unbounded.
Graph:
n1
^ ^
/ \
n2 n3<--p
^ ^
/ \
n4 n5
^
/
n6
If it's helpful, here's the cypher I ran to throw together this quick example:
CREATE path=(n1:n{id:1})<-[:HAS_PARENT]-(n2:n{id:2}),
(n1)<-[:HAS_PARENT]-(n3:n{id:3})<-[:HAS_PARENT]-(n4:n{id:4}),
(n3)<-[:HAS_PARENT]-(n5:n{id:5}),
(n4)<-[:HAS_PARENT]-(n6:n{id:6})
MATCH (n{id:3})
CREATE (:p)-[:IN]->(n)
Here is the current best query I have:
MATCH (n:n)<--(:p)
WITH collect (n) as parents, (n) as n
OPTIONAL MATCH (c)-[:HAS_PARENT*]->(n)
WITH collect(c) as children, (parents) as parents
UNWIND (parents+children) as tree
RETURN tree
This returns the correct set of results, and unlike some previous attempts I made which did not use any collect/unwind, the results come back as a single column of data as desired.
Is this the most optimal way of making this type of query? It is surprisingly more complex than I thought the simple scenario called for. I tried some queries where I combined the roots ("parents" in my query) with the "children" using a UNION clause, but I could not find a way to do so without repeating the query for the relationship with p. In my real world queries, that's a much more expensive operation which i've reduced down here for the example, so I cannot run it more than once.

This might suit your needs:
MATCH (c)-[:HAS_PARENT*0..]->(root:n)<--(:p)
RETURN root, COLLECT(c) AS tree
Each result row will contain a distinct root node and a collection if its tree nodes (including the root node).

Pairs from a directed acyclic Neo4j graph

I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.

Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.

In neo4j is there a way to get path between more than 2 random nodes whose direction of relation is not known

I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.

You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/

Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)

A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Finding all subtrees of nodes with a common second level relationship - neo4j

Related

Leaf nodes and paths in cypher

neo4j - restrict query based on node's rank

Is this the optimal cypher for returning every node of a subtree?

Pairs from a directed acyclic Neo4j graph

In neo4j is there a way to get path between more than 2 random nodes whose direction of relation is not known

Categories

Resources