Having this data for example :
CREATE
(p1:Person {name:"p1"}),
(p2:Person {name:"p2"}),
(p3:Person {name:"p3"}),
(p4:Person {name:"p4"}),
(p5:Person {name:"p5"}),
(p1)-[:KNOWS]->(p2),
(p1)-[:KNOWS]->(p3),
(p1)-[:KNOWS]->(p4),
(p5)-[:KNOWS]->(p3),
(p5)-[:KNOWS]->(p4)
I want to get common relationships between p1 and p5 :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN p, p1, p5
This returns 4 nodes : p1, p3, p4, p5 and 4 edges.
My aim is to get edges with direction as table rows : from and to. So this seems to works :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r1).name AS from, endNode(r1).name AS to
UNION
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r2).name AS from, endNode(r2).name AS to
The result is a table :
from | to
-----|----
p1 | p3
p1 | p4
p5 | p3
p5 | p4
My questions are :
Is it correct ?
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
And what if i want common nodes to 3 persons ?
The best way to check performance is to PROFILE your queries.
Is it correct ?
I'm not sure why you do a UNION, you can easily use a path check :
PROFILE MATCH (p1:Person {name:"p1"}), (p5:Person {name:"p5"})
MATCH path=(p1)-[*..2]-(p5)
UNWIND rels(path) AS r
RETURN startNode(r).name AS from, endNode(r).name AS to
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
Generally you would match first the start and end nodes of the path you want with single lookups (make sure you have an index/constraint on the label/property pair for the Person nodes).
Depending on your graph degree this can be an extensive operation, you can fine tune by limiting the max depth of the paths *..15 for example.
And what if i want common nodes to 3 persons ?
There are multiple ways depending on the size of your graph :
a) if not too many nodes :
Match the 3 nodes and find Persons that have at least one connection to ALL 3:
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
MATCH (p:Person) WHERE ALL(x IN persons WHERE EXISTS((x)--(p)))
RETURN p
b) some tuning, assume one common will be directly connected to the first node in the 3
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p
MATCH (p)-[:KNOWS]-(other)
WHERE ALL (x IN persons WHERE EXISTS((x)--(other)))
RETURN other
c) if you need the commons in a multiple depth path :
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p1, persons[1] as p2
MATCH path=(p1)-[*..15]-(p2)
WHERE ANY(x IN nodes(path) WHERE x = persons[2])
UNWIND rels(path) AS commonRel
WITH distinct commonRel AS r
RETURN startNode(r) AS from, endNode(r) AS to
I would suggest to grow your graph and try/tune your use cases
Related
I was practicing with the Movie Database from Neo4j in order to practice and I have done the next query:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
RETURN a
This query returns 3 rows but If I go to the graph view on the web editor and expand the "Tom Hanks" node I, of course, have one movie such that Tom Hanks directed and acted in that movie but the rest of the connected nodes only have the ACTED_IN relation. What I want to do is to, in this case, filter and remove Tom Hanks from the result since he has at least one connection such that it has only one relation (either ACTED_IN or DIRECTED)
PD: My expected result would be only the row representing node "Clint Eastwood"
So you only want results where the person acted in and directed the same movies, but never simply acted in, without directing, or directed, without acting.
You could use this approach:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN]->()) = actedDirectedCount AND size((a)-[:DIRECTED]->()) = actedDirectedCount
RETURN a
Though you can simplify this a bit by combining the relationship types in the pattern used in your WHERE clause like so:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN|DIRECTED]->()) = actedDirectedCount * 2
RETURN a
If the actedDirectedCount = 3 movies, then there must be at a minimum 3 :ACTED_IN relationships and 3 :DIRECTED relationships, so a minimum of 6 relationships using either relationship. If there are any more than this, then there are additional movies that they either acted in or directed, so we'd filter that out.
There options come to my mind:
1.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, collect(distinct m) as actedMovies
with a where all(x in directedMovies where x in actedMovies) and all(x in actedMovies where x in directedMovies)
return a
2.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with * order by id(m)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, m order by id (m)
with a, directedMovies, collect(distinct m) as actedMovies
with a where actedMovies=directedMovies
return a
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
with * where all(x in directedMovies where (a)-[:ACTED_IN]->(x))
MATCH (m:Movie)<-[:ACTED_IN]-(a)
with a, collect(distinct m) as actedMovies
with * where all(x in actedMovies where (a)-[:DIRECTED]->(x))
return a
The first two are equally expensive and the last one is a bit more expensive.
I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1
I have the following graph:
I need to get all the AD nodes which are related to a particular User node. If I search by a user B1, I should get all the AD nodes which are connected by HAS relation to B1 node as well as the AD nodes which are connected to its parent by HAS relation. But if any of these AD nodes are connected by an EXCLUDES relation, I should filter that one out.
For example, if I search by B1, I should get AD4,AD2
AD1 has EXCLUDES with D1 and AD3 has excludes with C1, hence filtered out.
I am using the following cypher
MATCH path=(p:AD)-[:HAS|EXCLUDES]-()<-[:CHILD_OF*]-(u:User) USING INDEX u:User(id) WHERE u.id = 'B1'
with p,
collect( filter( r in rels(path)
where type(r) = 'EXCLUDES'
)
) as test
where all( t in test where size(t) = 0 )
return p
The issue is when I search with C1, it return AD4,AD3,AD2. How can I eliminate AD3 from the result?
:CHILD_OF* doesn't include your starting node. To include that, set a lowerbound of 0:
[:CHILD_OF*0..]
That said, there are probably better ways to form your query. Try this, maybe:
MATCH (u:User)
WHERE u.id = 'B1'
WITH u, [(p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u) | p] as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p
EDIT
The pattern comprehension feature was released with Neo4j 3.1. You won't be able to use that in an older version. Try this instead:
MATCH (u:User)
WHERE u.id = 'B1'
OPTIONAL MATCH (p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u)
WITH u, collect(p) as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p
Using Neo4J and Cypher:
Given the diagram below, I want to be able to start at node 'A' and get all the children that have a 'ChildOf' relationship with 'A', but not an 'InactiveChildOf' relationship. So, in this example, I would get back A, C and G. Also, a node can get a new parent ('H' in the diagram) and if I ask for the children of 'H', I should get B, D and E.
I have tried
match (p:Item{name:'A'}) -[:ChildOf*]-(c:Item) where NOT (p)-[:InactiveChildOf]-(c) return p,c
however, that also returns D and E.
Also tried:
match (p:Item{name:'A'}) -[rels*]-(c:Item) where None (r in rels where type(r) = 'InactiveChildOf') return p,c
But that returns all.
Hopefully, this is easy for Neo4J and I am just missing something obvious. Appreciate the help!
Example data: MERGE (a:Item {name:'A'}) MERGE (b:Item {name:'B'}) MERGE (c:Item {name:'C'}) MERGE (d:Item {name:'D'}) MERGE (e:Item {name:'E'}) MERGE (f:Item {name:'F'}) MERGE (g:Item {name:'G'}) MERGE (h:Item {name:'H'}) MERGE (b)-[:ChildOf]->(a) MERGE (b)- [:InactiveChildOf] ->(a) MERGE (c)-[:ChildOf]->(a) MERGE (d)-[:ChildOf]->(b) MERGE (e)-[:ChildOf]->(b) MERGE (f)-[:ChildOf]->(c) MERGE (f)- [:InactiveChildOf] ->(c) MERGE (g)-[:ChildOf]->(c) MERGE (b)-[:ChildOf]->(h)
Note, I understand that I could simply put an "isActive" property on the ChildOf relationship or remove the relationship, but I am exploring options and trying to understand if this concept would work.
If a query interpreted as: find all the nodes, the path to which passes through the nodes unrelated by InactiveChildOf to the previous node, the request might be something like this:
match path = (p:Item{name:'A'})<-[:ChildOf*]-(c:Item)
with nodes(path) as nds
unwind range(0,size(nds)-2) as i
with nds,
nds[i] as i1,
nds[i+1] as i2
where not (i1)-[:InactiveChildOf]-(i2)
with nds,
count(i1) as test
where test = size(nds)-1
return head(nds),
last(nds)
Update: I think that this version is better (check that between two nodes there is no path that will contain at least one non-active type of relationship):
match path = (p:Item {name:'A'})<-[:ChildOf|InactiveChildOf*]-(c)
with p, c,
collect( filter( r in rels(path)
where type(r) = 'InactiveChildOf'
)
) as test
where all( t in test where size(t) = 0 )
return p, c
By reading and examining the graph, correct me if I'm wrong but the actual text representation of the cypher query should be
Find me nodes in a path to A, all nodes in that path cannot have an outgoing
InactiveChildOf relationship.
So, in Cypher it would be :
MATCH p=(i:Item {name:"A"})<-[:ChildOf*]-(x)
WHERE NONE( x IN nodes(p) WHERE (x)-[:InactiveChildOf]->() )
UNWIND nodes(p) AS n
RETURN distinct n
Which returns
sMy Cypher query finds the common child nodes for several starting nodes. Each path has only it's node id's extracted and returned resulting in hundreds of rows in each path. See p1 and p2 example (showing only 3 rows and two start points).
Match p1=(n1:node{name:"x" })-[r:sub*]->(i)), p2=(n2:node{name:"y" })-[r:sub*]->(i))
RETURN DISTINCT i, extract(a IN nodes(p1)| a.id) as p1, extract(b IN nodes(p2)| b.id) as p2
----RESULTS----
p1=1,4,3
p1=1,8,3
p1=1,8,9,3
p2=6,7,3
p2=6,5,9,3
p2=6,7,10,3
What I would like is to intersect the paths in cypher during the query so that I don't have to do it after. In php I would iterate using:
$result = array_intersect($p1,$p2);
This would return 9,3 from the above example because they are the only common nodes shared by all paths. Is there a way to do this in Cypher so that I don't have hundreds of rows returned?
Thanks!
I believe this will meet your needs.
Here is a picture of the data under consideration.
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, collect( distinct n ) as p2
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 | n.name] as n
EDIT: updated solution to include a third path
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
, p3=(n3:Node {name: 4 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2, collect(p3) as p3
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, p3, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, p3, collect( distinct n ) as p2
// recombine the nodes of the third path(s) as distinct collections of
unwind p3 as p
unwind nodes(p) as n
with i, p1, p2, collect( distinct n ) as p3
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 and n in p3 | n.name] as n