Cypher - find nodes whose related nodes have properties contained in a set - neo4j

I need a cypher query to search for something like this:
I have a graph (a)-[r]->(b) and (a)-[r]->(c) were a is a person and b and c are 2 different skill nodes.
Let's suppose I am looking for someone knowing both java and fortran.
Say b has property name:“java” and c has property name:“fortran”.
How do I find a person that has ALL specified skill nodes?
It'd be useful if the query was scalable, i.e. if I had 20 skill nodes, it would be also easy to execute it.
Thanks a lot in advance!

One way would be to MATCH your Person nodes to Skill nodes, filter Skill nodes for your properties and count the number of nodes per Person. If it's as large as the array of properties your filtering, the Person has all the Skills
MATCH (p:Person)-[r:HAS]->(s:Skill)
WHERE s.name IN ['java', 'fortran', 'cypher']
RETURN DISTINCT p, count(s)
I think you can combine this with a CASE statement to return the data:
MATCH (p:Person)-[r:HAS]->(s:Skill)
WHERE s.name IN ['java', 'fortran', 'cypher']
RETURN
CASE
WHEN count(s) = 3
THEN p
ELSE 0
END

Related

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?
We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a
Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

neo4j - match nodes and add relationships to them

how do i add relationships to nodes returned by a cypher query?
I have written a query that returns me all the person nodes who have the same surname who live at the same address. I now want to add a relationship between these person nodes to indicate they are the same person. The query below returns me 3 person nodes and I want to add a relationship from the first node (returned by the ORDER BY) to the other 2.
MATCH (a:Address) <-[LIVES_AT]-(p:Person)
WITH a as addnode, p.surname as psurname, COUNT(p.name_urn) as c
WHERE c > 1
MATCH (a2:Address{address_urn:addnode.address_urn})<-[LIVES_AT]- (p2:Person{surname:psurname})
WITH p2 as p2node
ORDER BY CASE
WHEN p2node.master_record = 'Y'
THEN
1
ELSE
2
END
WITH collect(p2node) as colp2node
RETURN colp2node
Hope this makes sense? Please advise if there is a better way of doing this.
Something like this should work for you:
MATCH (a:Address)<-[LIVES_AT]-(p:Person)
WITH a, p.surname AS psurname, COUNT(p.name_urn) AS c
WHERE c > 1
MATCH (a2:Address { address_urn:a.address_urn })<-[LIVES_AT]-(p2:Person { surname:psurname })
WITH p2
ORDER BY CASE WHEN p2.master_record = 'Y' THEN 1 ELSE 2 END
WITH collect(p2) AS colp2
WITH colp2[0] AS master, colp2[1..] AS others
UNWIND others AS other
MERGE (master)-[:HAS_ALIAS]->(other);
I use MERGE to avoid duplicate relationships.
By the way, just because two people have the same surname and live at the same address, that does not normally mean they are the same person. I hope you are sure that what you are doing is appropriate.

Neo4j: Cypher Query With Variable Length and Condition on Node Labels

What I'm looking for
With variable length relationships (see here in the neo4j manual), it is possible to have a variable number of relationships with a certain label between two nodes.
# Cypher
match (g1:Group)-[:sub_group*]->(g2:Group) return g1, g2
I'm looking for the same thing with nodes, i.e. a way to query for two nodes with a variable number of nodes in between, but with a label condition on the nodes rather than the relationships:
# Looking for something like this in Cypher:
match (g1:Group)-->(:Group*)-->(g2:Group) return g1, g2
Example
I would use this mechanism, for example, to find all (direct or indirect) members of a group within a group structure.
# Looking for somthing like this in Cypher:
match (group:Group)-->(:Group*)-->(member:User) return member
Take, for example, this structure:
group1:Group
|-------> group2:Group -------> user1:User
|-------> group3:Group
|--------> page1:Page -----> group4:Group -----> user2:User
In this example, user1 is a member of group1 and group2, but user2 is only member of group4, not member of the other groups, because a non-Group labeled node is in between.
Abstraction
A more abstract pattern would be a kind of repeat operator |...|* in Cypher:
# Looking for repeat operator in Cypher:
match (g1:Group)|-[:is_subgroup_of]->(:Group)|*-[:is_member_of]->(member:User)
return member
Does anyone know of such a repeat operator? Thanks!
Possible Solution
One solution I've found, is to use a condition on the nodes using where, but I hope, there is a better (and shorter) soluation out there!
# Cypher
match path = (member:User)<-[*]-(g:Group{id:1})
where all(node in tail(nodes(path)) where ('Group' in labels(node)))
return member
Explanation
In the above query, all(node in tail(nodes(path)) where ('Group' in labels(node))) is one single where condition, which consists of the following key parts:
all: ALL(x in coll where pred): TRUE if pred is TRUE for all values in
coll
nodes(path): NODES(path): Returns the nodes in path
tail(): TAIL(coll): coll except first element–––I'm using this, because the first node is a User, not a Group.
Reference
See Cypher Cheat Sheet.
How about this:
MATCH (:Group {id:1})<-[:IS_SUBGROUP_OF|:IS_MEMBER_OF*]-(u:User)
RETURN DISTINCT u
This will:
find all subtrees of the group with ID 1
only traverse the relationships IS_GROUP_OF and IS_MEMBER_OF in incoming direction (meaning sub-groups or users that belong to group with ID or one of its sub-groups)
only return nodes which have a IS_MEMBER_OF relationship to a group in the subtree
and discard duplicate results (users who belong to more than one of the groups in the tree would otherwise appear multiple times)
I know this relies on relationships types rather than node labels, but IMHO this is a more graphy approach.
Let me know if this would work or not.

Limiting ShortestPath in Cypher to nodes with specific properties

I am trying to figure out how to limit a shortest path query in cypher so that it only connects "Person" nodes containing a specific property.
Here is my query:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {id: 2})) RETURN p
I would like to limit it so that when it connects from one Person node to another Person node, the Person node has to contain a property called "job" and a value of "engineer."
Could you help me construct the query? Thanks!
Your requirements are not very clear, but if you simply want one of the people to have an id of 1 and the other person to be an engineer, you would use this:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {job: "engineer"}))
RETURN p;
This kind query should be much faster if you also created indexes for the id and job properties of Person.

Select nodes that has all relationships in Neo4j

Suppose I have two kinds of nodes, Person and Competency. They are related by a KNOWS relationship. For example:
(:Person {id: 'thiago'})-[:KNOWS]->(:Competency {id: 'neo4j'})
How do I query this schema to find out all Person that knows all nodes of a set of Competency?
Suppose that I need to find every Person that knows "java" and "haskell" and I'm only interested in the nodes that knows all of the listed Competency nodes.
I've tried this query:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id;
But I get back a list of all Person that knows either "java" or "haskell" and duplicated entries for those who knows both.
Adding a count(c) at the end of the query eliminates the duplicates:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id, count(c);
Then, in this particular case, I can iterate the result and filter out results that the count is less than two to get the nodes I want.
I've found out that I could do it appending consecutive match clauses to keep filtering the nodes to get the result I want, in this case:
match (p:Person)-[:KNOWS]->(:Competency {id:'haskell'})
match (p)-[:KNOWS]->(:Competency {id:'java'})
return p.id;
Is this the only way to express this query? I mean, I need to create a query by concatenating strings? I'm looking for a solution to a fixed query with parameters.
with ['java','haskell'] as skills
match (p:Person)-[:KNOWS]->(c:Competency)
where c.id in skills
with p.id, count(*) as c1 ,size(skills) as c2
where c1 = c2
return p.id
One thing you can do, is to count the number of all skills, then find the users that have the number of skill relationships equals to the skills count :
MATCH (n:Skill) WITH count(n) as skillMax
MATCH (u:Person)-[:HAS]->(s:Skill)
WITH u, count(s) as skillsCount, skillMax
WHERE skillsCount = skillMax
RETURN u, skillsCount
Chris
Untested, but this might do the trick:
match (p:Person)-[:KNOWS]->(c:Competency)
with p, collect(c.id) as cs
where all(x in ['java', 'haskell'] where x in cs)
return p.id;
How about this...
WITH ['java','haskell'] AS comp_col
MATCH (p:Person)-[:KNOWS]->(c:Competency)
WHERE c.name in comp_col
WITH comp_col
, p
, count(*) AS total
WHERE total = length(comp_col)
RETURN p.name, total
Put the competencies you want in a collection.
Match all the people that have either of those competencies
Get the count of compentencies by person where they have the same number as in the competency collection from the start
I think this will work for what you need, but if you are building these queries programatically the best performance you get might be with successive match clauses. Especially if you knew which competencies were most/least common when building your queries, you could order the matches such that the least common were first and the most common were last. I think that would chunk down to your desired persons the fastest.
It would be interesting to see what the plan analyzer in the sheel says about the different approaches.

Resources