Neo4j find communities around nodes - neo4j

Not sure this is possible but will try asking. I am trying to find (Person) nodes groups that share at least 5 (Action) nodes where model is
(p:PERSON)-[:CHAT]->(a:ACTION)
I can do this for showing 2 Persons groups that share +5 Actions
MATCH path =(p1:PERSON)-[r1:CHAT]->(a:ACTION)<-[r2:CHAT]-(p2:PERSON)
WITH p1, p2, count(a) as ActionCount WHERE ActionCount >= 5
RETURN (p1)-[:CHAT]->(:ACTION)<-[:CHAT]-(p2)
However is there a smart way to do this dynamically or using collections where there are more people in a shared group? I am trying to identify efficient teams based on Action metrics, and flagging virtual teams if they share at least 5 actions
many thanks

So I think you can do this by programmatically generating a query. I'm not sure if you can do this programmatically in Cypher. To generate a query easily I would do something like:
MATCH
(a:ACTION),
(a)<-[:CHAT]-(p1:PERSON),
(a)<-[:CHAT]-(p2:PERSON),
(a)<-[:CHAT]-(p3:PERSON),
(a)<-[:CHAT]-(p4:PERSON),
(a)<-[:CHAT]-(p5:PERSON)
WITH p1, p2, p3, p4, p5, count(a) as ActionCount
WHERE ActionCount >= 5
RETURN [p1, p2, p3, p4, p5], ActionCount
Not that you don't need the path and relationship variables if you're not using them later.

I think you said this both ways (five actions per user / five users per action). It should work the same either way:
MATCH (p:PERSON)-[:CHAT]->(a:ACTION)
WITH p, count(a) AS action_count
WHERE action_count >= 5
MATCH (p)-[:CHAT]->(a:ACTION)
RETURN p, collect(a)
I just made up what is being returned there. You should be able to return anything that you like.
Another way to do it:
MATCH (p:PERSON)
WHERE size( (p)-[:CHAT]->(:ACTION) ) >= 5
WITH p
MATCH (p)-[:CHAT]->(a:ACTION)
RETURN p, collect(a)

Related

Excluding "symmetric" results in Neo4j

I want to query a Neo4j graph for a structure that includes two interchangeable nodes, but I don't want two unique responses for each of the "symmetric" responses.
How do I express in Cypher that two nodes are interchangeable?
An example:
I want to look for the following structure in the graph with the following query:
MATCH (c:Customer)-[]->(p:Purchase)
MATCH (c:Customer)-[]->(q:Purchase)
MATCH (p)-[]->(m:Company)
MATCH (q)-[]->(m:Company)
RETURN DISTINCT c, p, q, m
The default behavior would be for Neo4j to return the following two graphs:
(i.e. The assignment of p and q to Purchase1 and Purchase2 are reversed)
How do I express that the elements p and q in my query are interchangeable, and I only need one of the above responses?
To prevent those kinds of results, you would typically have an inequality based on the node ids:
WHERE id(p) < id(q)
That said, you may be able to form this query a little cleaner like this (provided you want all purchases between a customer and a company with at least two purchases made from that customer to the company):
MATCH (c:Customer)-->(p:Purchase)-->(m:Company)
WITH c, m, collect(p) as purchases, count(p) as purchaseCount
WHERE purchaseCount >= 2
RETURN c, m, purchases

neo4j cypher to filter multi paths based on two relationships

I have the following graph:
I need to get all the AD nodes which are related to a particular User node. If I search by a user B1, I should get all the AD nodes which are connected by HAS relation to B1 node as well as the AD nodes which are connected to its parent by HAS relation. But if any of these AD nodes are connected by an EXCLUDES relation, I should filter that one out.
For example, if I search by B1, I should get AD4,AD2
AD1 has EXCLUDES with D1 and AD3 has excludes with C1, hence filtered out.
I am using the following cypher
MATCH path=(p:AD)-[:HAS|EXCLUDES]-()<-[:CHILD_OF*]-(u:User) USING INDEX u:User(id) WHERE u.id = 'B1'
with p,
collect( filter( r in rels(path)
where type(r) = 'EXCLUDES'
)
) as test
where all( t in test where size(t) = 0 )
return p
The issue is when I search with C1, it return AD4,AD3,AD2. How can I eliminate AD3 from the result?
:CHILD_OF* doesn't include your starting node. To include that, set a lowerbound of 0:
[:CHILD_OF*0..]
That said, there are probably better ways to form your query. Try this, maybe:
MATCH (u:User)
WHERE u.id = 'B1'
WITH u, [(p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u) | p] as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p
EDIT
The pattern comprehension feature was released with Neo4j 3.1. You won't be able to use that in an older version. Try this instead:
MATCH (u:User)
WHERE u.id = 'B1'
OPTIONAL MATCH (p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u)
WITH u, collect(p) as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p

Neo4j: optimum path search

Having a graph of people who like rated movies, I would like to extract for each pair of people their highest rated movie. I'm using the following query which requires sorting movies on their rate for each pair of people.
MATCH (p1:People) -[:LIKES]-> (m:Movie) <-[:LIKES]- (p2:People) WHERE id(p1) < id(p2)
WITH p1, p2, m ORDER BY m.Rating desc
RETURN p1, p2, head(collect(m) as best
I can put movie rating (1/rating or maxRating-rating) into :LIKES relationships, which hence let me identify which movie is in the top rating of both people.
MATCH (p1:People), (p2:People) call apoc.algo.dijkstra(p1, p2, 'LIKES', 'rating') YIELD path as path, weight as weight return path, weight
Is there a way to use a Dijkstra-like algorithm which would find the allOptimumPath through highest scored nodes to improve the performance of my first query and return paths rather than their starting, middle and ending nodes ?
Many thanks in advance.
Here is an alternate solution which preserves the path rather than reporting extracted nodes.
MATCH path=(p1:People) -[:LIKES]-> (m:Movie) <-[:LIKES]- (p2:People)
WHERE id(p1) < id(p2)
WITH head(nodes(p)) as p1, last(nodes(p)) as p2, path
ORDER BY m.Rating desc
WITH p1, p2, head(collect(p)) as optPath
RETURN optPath

Neo4j Cypher : listing edges

Having this data for example :
CREATE
(p1:Person {name:"p1"}),
(p2:Person {name:"p2"}),
(p3:Person {name:"p3"}),
(p4:Person {name:"p4"}),
(p5:Person {name:"p5"}),
(p1)-[:KNOWS]->(p2),
(p1)-[:KNOWS]->(p3),
(p1)-[:KNOWS]->(p4),
(p5)-[:KNOWS]->(p3),
(p5)-[:KNOWS]->(p4)
I want to get common relationships between p1 and p5 :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN p, p1, p5
This returns 4 nodes : p1, p3, p4, p5 and 4 edges.
My aim is to get edges with direction as table rows : from and to. So this seems to works :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r1).name AS from, endNode(r1).name AS to
UNION
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r2).name AS from, endNode(r2).name AS to
The result is a table :
from | to
-----|----
p1 | p3
p1 | p4
p5 | p3
p5 | p4
My questions are :
Is it correct ?
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
And what if i want common nodes to 3 persons ?
The best way to check performance is to PROFILE your queries.
Is it correct ?
I'm not sure why you do a UNION, you can easily use a path check :
PROFILE MATCH (p1:Person {name:"p1"}), (p5:Person {name:"p5"})
MATCH path=(p1)-[*..2]-(p5)
UNWIND rels(path) AS r
RETURN startNode(r).name AS from, endNode(r).name AS to
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
Generally you would match first the start and end nodes of the path you want with single lookups (make sure you have an index/constraint on the label/property pair for the Person nodes).
Depending on your graph degree this can be an extensive operation, you can fine tune by limiting the max depth of the paths *..15 for example.
And what if i want common nodes to 3 persons ?
There are multiple ways depending on the size of your graph :
a) if not too many nodes :
Match the 3 nodes and find Persons that have at least one connection to ALL 3:
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
MATCH (p:Person) WHERE ALL(x IN persons WHERE EXISTS((x)--(p)))
RETURN p
b) some tuning, assume one common will be directly connected to the first node in the 3
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p
MATCH (p)-[:KNOWS]-(other)
WHERE ALL (x IN persons WHERE EXISTS((x)--(other)))
RETURN other
c) if you need the commons in a multiple depth path :
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p1, persons[1] as p2
MATCH path=(p1)-[*..15]-(p2)
WHERE ANY(x IN nodes(path) WHERE x = persons[2])
UNWIND rels(path) AS commonRel
WITH distinct commonRel AS r
RETURN startNode(r) AS from, endNode(r) AS to
I would suggest to grow your graph and try/tune your use cases

Select nodes that has all relationships in Neo4j

Suppose I have two kinds of nodes, Person and Competency. They are related by a KNOWS relationship. For example:
(:Person {id: 'thiago'})-[:KNOWS]->(:Competency {id: 'neo4j'})
How do I query this schema to find out all Person that knows all nodes of a set of Competency?
Suppose that I need to find every Person that knows "java" and "haskell" and I'm only interested in the nodes that knows all of the listed Competency nodes.
I've tried this query:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id;
But I get back a list of all Person that knows either "java" or "haskell" and duplicated entries for those who knows both.
Adding a count(c) at the end of the query eliminates the duplicates:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id, count(c);
Then, in this particular case, I can iterate the result and filter out results that the count is less than two to get the nodes I want.
I've found out that I could do it appending consecutive match clauses to keep filtering the nodes to get the result I want, in this case:
match (p:Person)-[:KNOWS]->(:Competency {id:'haskell'})
match (p)-[:KNOWS]->(:Competency {id:'java'})
return p.id;
Is this the only way to express this query? I mean, I need to create a query by concatenating strings? I'm looking for a solution to a fixed query with parameters.
with ['java','haskell'] as skills
match (p:Person)-[:KNOWS]->(c:Competency)
where c.id in skills
with p.id, count(*) as c1 ,size(skills) as c2
where c1 = c2
return p.id
One thing you can do, is to count the number of all skills, then find the users that have the number of skill relationships equals to the skills count :
MATCH (n:Skill) WITH count(n) as skillMax
MATCH (u:Person)-[:HAS]->(s:Skill)
WITH u, count(s) as skillsCount, skillMax
WHERE skillsCount = skillMax
RETURN u, skillsCount
Chris
Untested, but this might do the trick:
match (p:Person)-[:KNOWS]->(c:Competency)
with p, collect(c.id) as cs
where all(x in ['java', 'haskell'] where x in cs)
return p.id;
How about this...
WITH ['java','haskell'] AS comp_col
MATCH (p:Person)-[:KNOWS]->(c:Competency)
WHERE c.name in comp_col
WITH comp_col
, p
, count(*) AS total
WHERE total = length(comp_col)
RETURN p.name, total
Put the competencies you want in a collection.
Match all the people that have either of those competencies
Get the count of compentencies by person where they have the same number as in the competency collection from the start
I think this will work for what you need, but if you are building these queries programatically the best performance you get might be with successive match clauses. Especially if you knew which competencies were most/least common when building your queries, you could order the matches such that the least common were first and the most common were last. I think that would chunk down to your desired persons the fastest.
It would be interesting to see what the plan analyzer in the sheel says about the different approaches.

Resources