Need only common nodes across multiple paths - Neo4j Cypher - path

sMy Cypher query finds the common child nodes for several starting nodes. Each path has only it's node id's extracted and returned resulting in hundreds of rows in each path. See p1 and p2 example (showing only 3 rows and two start points).
Match p1=(n1:node{name:"x" })-[r:sub*]->(i)), p2=(n2:node{name:"y" })-[r:sub*]->(i))
RETURN DISTINCT i, extract(a IN nodes(p1)| a.id) as p1, extract(b IN nodes(p2)| b.id) as p2
----RESULTS----
p1=1,4,3
p1=1,8,3
p1=1,8,9,3
p2=6,7,3
p2=6,5,9,3
p2=6,7,10,3
What I would like is to intersect the paths in cypher during the query so that I don't have to do it after. In php I would iterate using:
$result = array_intersect($p1,$p2);
This would return 9,3 from the above example because they are the only common nodes shared by all paths. Is there a way to do this in Cypher so that I don't have hundreds of rows returned?
Thanks!

I believe this will meet your needs.
Here is a picture of the data under consideration.
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, collect( distinct n ) as p2
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 | n.name] as n
EDIT: updated solution to include a third path
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
, p3=(n3:Node {name: 4 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2, collect(p3) as p3
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, p3, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, p3, collect( distinct n ) as p2
// recombine the nodes of the third path(s) as distinct collections of
unwind p3 as p
unwind nodes(p) as n
with i, p1, p2, collect( distinct n ) as p3
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 and n in p3 | n.name] as n

Related

Neo4j Cypher : listing edges

Having this data for example :
CREATE
(p1:Person {name:"p1"}),
(p2:Person {name:"p2"}),
(p3:Person {name:"p3"}),
(p4:Person {name:"p4"}),
(p5:Person {name:"p5"}),
(p1)-[:KNOWS]->(p2),
(p1)-[:KNOWS]->(p3),
(p1)-[:KNOWS]->(p4),
(p5)-[:KNOWS]->(p3),
(p5)-[:KNOWS]->(p4)
I want to get common relationships between p1 and p5 :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN p, p1, p5
This returns 4 nodes : p1, p3, p4, p5 and 4 edges.
My aim is to get edges with direction as table rows : from and to. So this seems to works :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r1).name AS from, endNode(r1).name AS to
UNION
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r2).name AS from, endNode(r2).name AS to
The result is a table :
from | to
-----|----
p1 | p3
p1 | p4
p5 | p3
p5 | p4
My questions are :
Is it correct ?
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
And what if i want common nodes to 3 persons ?
The best way to check performance is to PROFILE your queries.
Is it correct ?
I'm not sure why you do a UNION, you can easily use a path check :
PROFILE MATCH (p1:Person {name:"p1"}), (p5:Person {name:"p5"})
MATCH path=(p1)-[*..2]-(p5)
UNWIND rels(path) AS r
RETURN startNode(r).name AS from, endNode(r).name AS to
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
Generally you would match first the start and end nodes of the path you want with single lookups (make sure you have an index/constraint on the label/property pair for the Person nodes).
Depending on your graph degree this can be an extensive operation, you can fine tune by limiting the max depth of the paths *..15 for example.
And what if i want common nodes to 3 persons ?
There are multiple ways depending on the size of your graph :
a) if not too many nodes :
Match the 3 nodes and find Persons that have at least one connection to ALL 3:
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
MATCH (p:Person) WHERE ALL(x IN persons WHERE EXISTS((x)--(p)))
RETURN p
b) some tuning, assume one common will be directly connected to the first node in the 3
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p
MATCH (p)-[:KNOWS]-(other)
WHERE ALL (x IN persons WHERE EXISTS((x)--(other)))
RETURN other
c) if you need the commons in a multiple depth path :
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p1, persons[1] as p2
MATCH path=(p1)-[*..15]-(p2)
WHERE ANY(x IN nodes(path) WHERE x = persons[2])
UNWIND rels(path) AS commonRel
WITH distinct commonRel AS r
RETURN startNode(r) AS from, endNode(r) AS to
I would suggest to grow your graph and try/tune your use cases

cypher query to return or keep only the final sequence when variable length relationship identifiers are used

Is there a way to keep or return only the final full sequences of nodes instead of all subpaths when variable length identifiers are used in order to do further operations on each of the final full sequence path.
MATCH path = (S:Person)-[rels:NEXT*]->(E:Person)................
eg: find all sequences of nodes with their names in the given list , say ['graph','server','db'] with same 'seqid' property exists in the relationship in between.
i.e.
(graph)->(server)-(db) with same seqid :1
(graph)->(db)->(server) with same seqid :1 //there can be another matching
sequence with same seqid
(graph)->(db)->(server) with same seqid :2
Is there a way to keep only the final sequence of nodes say ' (graph)->(server)->(db)' for each sequences instead of each of the subpath of a large sequence like (graph)->(server) or (server)->(db)
pls help me to solve this.........
(I am using neo4j 2.3.6 community edition via java api in embedded mode..)
What we could really use here is a longestSequences() function that would do exactly what you want it to do, expand the pattern such that a and b would always be matched to start and end points in the sequence such that the pattern is not a subset of any other matched pattern.
I created a feature request on neo4j for exactly this: https://github.com/neo4j/neo4j/issues/7760
And until that gets implemented, we'll have to make do with some alternate approach. I think what we'll have to do is add additional matching to restrict a and b to start and end nodes of full sequences.
Here's my proposed query:
WITH ['graph', 'server' ,'db'] as names
MATCH p=(a)-[rels:NEXT*]->(b)
WHERE ALL(n in nodes(p) WHERE n.name in names)
AND ALL( r in rels WHERE rels[0]['seqid'] = r.seqid )
WITH names, p, a, rels, b
// check if b is a subsequence node instead of an end node
OPTIONAL MATCH (b)-[rel:NEXT]->(c)
WHERE c.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where b is a subsequence node
WITH names, p, a, rels, b, c
WHERE c IS NULL
WITH names, p, a, rels, b
// check if a is a subsequence node instead of a start node
OPTIONAL MATCH (d)-[rel:NEXT]->(a)
WHERE d.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where a is a subsequence node
WITH p, a, b, d
WHERE d IS NULL
RETURN p, a as startNode, b as endNode
MATCH (S:Person)-[r:NEXT]->(:Person)
// Possible starting node
WHERE NOT ( (:Person)-[:NEXT {seqid: r.seqid}]->(S) )
WITH S,
// Collect all possible values of `seqid`
collect (distinct r.seqid) as seqids
UNWIND seqids as seqid
// Possible terminal node
MATCH (:Person)-[r:NEXT {seqid: seqid}]->(E:Person)
WHERE NOT ( (E)-[:NEXT {seqid: seqid}]->(:Person) )
WITH S,
seqid,
collect(distinct E) as ES
UNWIND ES as E
MATCH path = (S)-[rels:NEXT* {seqid: seqid}]->(E)
RETURN S,
seqid,
path
[EDITED]
This query might do what you want:
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN path;
It first identifies all Person nodes (p1) with at least one outgoing NEXT relationship that have no incoming NEXT relationships (with the same seqid), and their distinct outgoing seqid values. Then it finds all "complete" paths (i.e., paths whose start and end nodes have no incoming or outgoing NEXT relationships with the desired seqid, respectively) starting at each p1 node and having relationships all sharing the same seqid. Finally, it returns each complete path.
If you just want to get the name property of all the Person nodes in each path, try this query (with a different RETURN clause):
MATCH (p1:Person)-[rel:NEXT]->(:Person)
WHERE NOT (:Person)-[:NEXT {seqid: rel.seqid}]->(p1)
WITH DISTINCT p1, rel.seqid AS seqid
MATCH path = (p1)-[:NEXT* {seqid: seqid}]->(p2:Person)
WHERE NOT (p2)-[:NEXT {seqid: seqid}]->(:Person)
RETURN EXTRACT(n IN NODES(path) | n.name);

Tuple of nearest neighbours in a subgraph in neo4J

I have a subgraph in neo4j with multiple paths, generated via:
match p=((n:Actor)-[*1..3]->(m:film)) where n.surname='Craig' and m.name='Minions' and ALL(x in nodes(p)[1..length(p)-1] where labels(x)[0]='Director') return p
Now, from this subgraph I want a list of tuples, where each tuple is a pair of connected nodes in the subgraph:
node0, node1
node1, node3
node0, node2
node2, node26
I tried:
match p=((n:Actor)-[*1..3]->(m:film))
where n.surname='Craig' and m.name='Minions' and ALL(x in nodes(p)[1..length(p)-1] where labels(x)[0]='Director')
with nodes(p) as np
match p2=((nn)-[]-()) where nn IN np
return p2
but this just returned the nearest neighbour of every single node in p. Including to nodes not in the subgraph.
This seems to work
MATCH p=((n:Actor)-[*1..3]->(m:Film))
WHERE n.surname='Craig' AND m.name='minions' AND ALL(x in nodes(p)[1..length(p)-1] WHERE labels(x)[0]='Director')
MATCH p2=(n2)-[r]-(m2)
WHERE n2 IN nodes(p) AND m2 IN nodes(p)
RETURN
n2,r,m2
However is very slow, any speed up recommendations?
Rather than take all of the data from the paths and rematch it against the graph you could process the paths for pairs in memory. If you take the nodes from each path and process them two an a time, you can collect them in ordered pairs.
...
// for each path grab the nodes
// and an index for them less the last one
//
with nodes(p) as node_list, range(0, size(nodes(p)) - 2, 1) as idx
//
// put the tuples in ordered pairs
//
unwind idx as i
with node_list[i] as a , node_list[i+1] as b
with
case
when id(a) < id(b) then [id(a), id(b)]
else [id(b), id(a)]
end as tuple
return tuple, count(*)
order by count(*) desc

Cypher Query for neo4j to get the desired traversal

I am Modifying the Question so that it can be easily tested
Graph being used for testing
Green Nodes are Organizations and Blue Nodes are Person. Here is the script to create this graph:
CREATE (A:Organization {PRID:'A', Name:'Organization-A'})
CREATE (B:Organization {PRID:'B', Name:'Organization-B'})
CREATE (C:Organization {PRID:'C', Name:'Organization-C'})
CREATE (D:Organization {PRID:'D', Name:'Organization-D'})
CREATE (E:Organization {PRID:'E', Name:'Organization-E'})
CREATE (F:Organization {PRID:'F', Name:'Organization-F'})
CREATE (G:Organization {PRID:'G', Name:'Organization-G'})
CREATE (H:Organization {PRID:'H', Name:'Organization-G'})
CREATE (I:Organization {PRID:'I', Name:'Organization-I'})
CREATE (P1:Person {PRID:'P1', Name:'Person-P1'})
CREATE (P2:Person {PRID:'P2', Name:'Person-P2'})
CREATE (P3:Person {PRID:'P3', Name:'Person-P3'})
CREATE (P4:Person {PRID:'P4', Name:'Person-P4'})
CREATE (P5:Person {PRID:'P5', Name:'Person-P5'})
CREATE (P6:Person {PRID:'P6', Name:'Person-P6'})
CREATE
(B)-[:CONTROL]->(A),
(C)-[:CONTROL]->(A),
(D)-[:CONTROL]->(C),
(E)-[:CONTROL]->(C),
(G)-[:CONTROL]->(F),
(H)-[:CONTROL]->(F),
(D)-[:EMPLOYS]->(P1),
(P1)-[:SPOUSE]->(P2),
(P2)-[:CONSULTS]->(E),
(B)-[:EMPLOYS]->(P3),
(P3)-[:SPOUSE]->(P4),
(P4)-[:CONSULTS]->(I),
(H)-[:EMPLOYS]->(P5),
(P5)-[:SPOUSE]->(P6)
;
I am trying to write a cypher query that needs to accomplish the following:
a) Start with a Node with PRID = 'C'
b) Path p1 = All the nodes connected to starting node with relationship types CONTROL (recursively) - irrespective of direction
c) Path p2= optionally match the following relationship pattern
(x1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(x2)
where (x1) and (x2) are nodes from the path p1 - found in step (b).
Return p1 and p2
Till now have tried the following three queries (with Brian's help)
Query1:
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*]-(y)
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
WHERE node1 in nodes(p1) and node2 in nodes(p1)
RETURN p1,p2;
Quesry2:
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*]-(y)
WITH p1, nodes(p1) AS p1_nodes
UNWIND p1_nodes AS node1
UNWIND p1_nodes AS node2
WITH p1, node1, node2
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
WHERE p2 IS NOT NULL
RETURN p1, p2;
Query3:
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*]-(y)
WITH p1, EXTRACT(node IN nodes(p1) | ID(node)) AS p1_node_ids
UNWIND p1_node_ids AS id1
UNWIND p1_node_ids AS id2
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
WHERE ID(node1) = id1 AND ID(node2) = id2 AND p2 IS NOT NULL
RETURN p1, p2;
What I'd Expect is to get back the subgraph with nodes A, B, C, D, E,P1,P2 with the relationships, however all three just give me just A, B, C, D, E with the relationships (that is just p1, nothing from p2)
Some more queries we tried that work with some anchor nodes but not with any of the node in the first hierarchy
Query-4
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*]-(y)
WITH collect({path: p1, node: y}) AS paths_and_nodes
UNWIND paths_and_nodes AS paths_and_node1
UNWIND paths_and_nodes AS paths_and_node2
WITH
paths_and_node1.node AS node1,
paths_and_node2.node AS node2,
paths_and_node1.path AS path1,
paths_and_node2.path AS path2
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
RETURN path1, path2, p2, node1, node2
This one works with the x being specified as A, B or C. BUT Does not work if x points to D or E
Query 5
MATCH
p1=(org1)-[:CONTROL*]-(x:Organization {PRID: 'C'})-[:CONTROL*]-(org2)
OPTIONAL MATCH p2=(org1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(org2)
RETURN p1, p2, org1, org2
This one works with the x being specified as C,D or E. BUT Does not work if x points to A or B
One thought - so if we have the query as
MATCH p1=(x:Organization {PRID: 'E'})-[r:CONTROL*]-(y)
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
WHERE node1.PRID in ['A','B','C','D','E'] AND
node2.PRID in ['A','B','C','D','E']
RETURN p1,p2;
then it obviously works fine. So cant we somehow using COLLECT etc create this array and pass it to the next query. The problem seems to be - if after the first match i use WITH p1, COLLECT (y.PRID) AS p1_prids the p1_prids is not ['A','B','C','D','E'] but rather a multi row collection with just one element each
One way i could make it work consistently is
MATCH (x:Organization {PRID: 'C'})-[r:CONTROL*0..]-(y)
WITH COLLECT (y.PRID) AS p1_prids
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*0..]-(y)
WITH p1,p1_prids
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
WHERE node1.PRID in p1_prids AND node2.PRID in p1_prids
but i think this is very inelegant and performance nightmare as it does the query twice - so still looking for a solution ...
What am i doing wrong in this query ?
Is there a better way to approach this problem
Thanks in advance ...
Ok, let me start over with a different answer for your new dataset (thanks for that, by the way, it was really helpful!)
The problem that I wasn't realizing was that the nodes that you want to match together will be in different results of the p1 path because they're on either side of the start node. So you could do something like this:
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*]-(y)
WITH collect({path: p1, node: y}) AS paths_and_nodes
UNWIND paths_and_nodes AS paths_and_node1
UNWIND paths_and_nodes AS paths_and_node2
WITH
paths_and_node1.node AS node1,
paths_and_node2.node AS node2,
paths_and_node1.path AS path1,
paths_and_node2.path AS path2
MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
RETURN path1, path2, p2, node1, node2
Or something like this which is a lot simpler:
MATCH
p1=(org1)-[:CONTROL*]-(x:Organization {PRID: 'C'})-[:CONTROL*]-(org2),
p2=(org1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(org2)
RETURN p1, p2, org1, org2
EDIT
So I think I see the problem. I wasn't thinking about starting from any of the Organization nodes. I'm pretty sure the second query will never work because Cypher will never hit the same node twice in a path. So looking at the first query the reason it's not working with D and E is because the default [*] variable relationship definition is for one or more hops. If we let it be zero or more hops then it seems to work:
MATCH p1=(x:Organization {PRID: 'C'})-[r:CONTROL*0..]-(y)
WITH collect({path: p1, node: y}) AS paths_and_nodes
UNWIND paths_and_nodes AS paths_and_node1
UNWIND paths_and_nodes AS paths_and_node2
WITH
paths_and_node1.node AS node1,
paths_and_node2.node AS node2,
paths_and_node1.path AS path1,
paths_and_node2.path AS path2
OPTIONAL MATCH p2=(node1)-[:EMPLOYS]->()-[:SPOUSE]->()-[:CONSULTS]->(node2)
RETURN path1, path2, p2, node1, node2
How's that?
Let me give it a shot! ;)
So first off you can simplify your first MATCH/WHERE combo like this:
MATCH p1=(x:Organization {PID: '27762230'})-[r:`10006`|`10010`*]-(y)
So let's take that and try to do what you want:
MATCH p1=(x:Organization {PID: '27762230'})-[r:`10006`|`10010`*]-(y)
WITH p1, nodes(p1) AS p1_nodes
UNWIND p1_nodes AS node1
UNWIND p1_nodes AS node2
WITH p1, node1, node2
OPTIONAL MATCH p2=(node1)-[:`10004`]->()-[:`10051`]->()-[:`10052`]->(node2)
WHERE p2 IS NOT NULL
RETURN p1, p2
It could also be that when you call nodes(path) that the objects that you get aren't nodes as much as Maps of the node properties. If that's so we should be able to match via the IDs:
MATCH p1=(x:Organization {PID: '27762230'})-[r:`10006`|`10010`*]-(y)
WITH p1, EXTRACT(node IN nodes(p1) | ID(node)) AS p1_node_ids
UNWIND p1_node_ids AS id1
UNWIND p1_node_ids AS id2
OPTIONAL MATCH p2=(node1)-[:`10004`]->()-[:`10051`]->()-[:`10052`]->(node2)
WHERE ID(node1) = id1 AND ID(node2) = id2 AND p2 IS NOT NULL
RETURN p1, p2

How to achieve the following in Cypher query?

I tried the following query # http://goo.gl/Ou2GZG
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT pt AS pts, t
MATCH p=t-[*]-pfn
WHERE NONE (n IN nodes(p)
WHERE id(n)=3 OR id(n)=7)
RETURN DISTINCT pfn AS pf
but I don't want to hard code 3 and 7 in the penultimate line where 3 and 7 are the nodes contained in (pts). I tried the following but I am getting "Unclosed parenthesis" error
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT pt AS pts, t
MATCH p=t-[*]-pfn FOREACH(pt in pts :
WHERE NONE (n IN nodes(p)
WHERE id(n)=id(pt)))
RETURN DISTINCT pfn AS pf
I think you can use the ALL predicate to ensures that for each node n in the path p there doesn't exist a node in pt that has the same id as the node n,
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT collect(id(pt)) AS pts, t
MATCH p=t-[*]-pfn
WHERE ALL (n IN nodes(p)
WHERE NONE (pt IN pts
WHERE id(n)= pt))
RETURN DISTINCT pfn AS pf

Resources