neo4j - find nodes with strong relationship - neo4j

I have in my graph places and persons as labels, and a relationship "knows_the_place". Like:
(person)-[knows_the_place]->(place)
A person usually knows multiple places.
Now I want to find the persons with a "strong" relationship via the places (which have a lot of "places" in common), so for example I want to query all persons, that share at least 3 different places, something like this (not working!) query:
MATCH
(a:person)-[:knows_the_place]->(x:place)<-[:knows_the_place]-(b:person),
(a:person)-[:knows_the_place]->(y:place)<-[:knows_the_place]-(b:person),
(a:person)-[:knows_the_place]->(z:place)<-[:knows_the_place]-(b:person)
WHERE NOT x=y and y=z
RETURN a, b
How can I do this with neo4j Query?
Bonus-Question:
Instead of showing me the person which have x places in common with another person, even better would be, if I could get a order list like:
a shares 7 places with b
c shares 5 places with b
d shares 2 places with e
f shares 1 places with a
...
Thanks for your help!

Here you go:
MATCH (a:person)-[:knows_the_place]->(x:place)<-[:knows_the_place]-(b:person)
WITH a, b, count(x) AS count
WHERE count >= 3
RETURN a, b, count
To order:
MATCH (a:person)-[:knows_the_place]->(x:place)<-[:knows_the_place]-(b:person)
RETURN a, b, count(x) AS count
ORDER BY count(x) DESC
You can also do both by adding an ORDER BY to the of the first query.
Keep in mind that this query is a cartesian product of a and b so it will examine every combination of person nodes, which may be not great performance-wise if you have a lot of person nodes. Neo4j 2.3 should warn you about these sorts of queries.

Related

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?
We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a
Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

Cypher - find nodes whose related nodes have properties contained in a set

I need a cypher query to search for something like this:
I have a graph (a)-[r]->(b) and (a)-[r]->(c) were a is a person and b and c are 2 different skill nodes.
Let's suppose I am looking for someone knowing both java and fortran.
Say b has property name:“java” and c has property name:“fortran”.
How do I find a person that has ALL specified skill nodes?
It'd be useful if the query was scalable, i.e. if I had 20 skill nodes, it would be also easy to execute it.
Thanks a lot in advance!
One way would be to MATCH your Person nodes to Skill nodes, filter Skill nodes for your properties and count the number of nodes per Person. If it's as large as the array of properties your filtering, the Person has all the Skills
MATCH (p:Person)-[r:HAS]->(s:Skill)
WHERE s.name IN ['java', 'fortran', 'cypher']
RETURN DISTINCT p, count(s)
I think you can combine this with a CASE statement to return the data:
MATCH (p:Person)-[r:HAS]->(s:Skill)
WHERE s.name IN ['java', 'fortran', 'cypher']
RETURN
CASE
WHEN count(s) = 3
THEN p
ELSE 0
END

Neo4j - Get all related nodes of type and create new relationship

I have a dataset that looks like this (Artefact)-[HAS]-(Keyword), keywords can be shared multiple times by artefacts. What I am trying to achieve is;
Returning most interconnected keyword nodes, count of artefacts related to keywords, count of the overlap between keyword nodes and the hop to another keyword (keyword)-(artefact)-(keywords), the "shared" artefact count between two keywords.
In other words a count of the artefact records within an intersect between two keyword nodes. For example given these three artefact nodes
1) spoon (keywords; metal, food)
2) sword (keywords; metal, fighting)
3) fork (keywords; metal, food)
The query would therefore return the keyword node, count of artefacts related to keyword (3, spoon, sword and fork), count of the keywords related by artefact between keyword nodes (metal has 2 indirect connections to food and 1 to fighting).
Once I've worked that out, for the sake of speed because I realise this is a big query, create a related_to relationship between keywords with the count of the number of artefacts they share in common. Only select 1 record to create this relationship, to test it works :) (hence limit 1)
MATCH (n:Keyword)-[r*2]-(x:Keyword)
WITH n, COUNT(r) AS c, x
LIMIT 1
MERGE (n)-[s:RELATED_KEY]-(x) SET s.weight = c
I'm using neo4j community edition (2.1.6),
Many thanks, Andy
This query will return you the first part of your answer :
MATCH (k:Keyword)
WITH k
LIMIT 1
MATCH (k)<-[:HAS]-(a)
WITH k, collect(a) as artefacts
WITH k, artefacts, size(artefacts) as c
UNWIND artefacts as artefact
MATCH (k)<-[:HAS]-(artefact)-[:HAS]->(k2)
RETURN c, artefacts, collect(distinct(k2.name)) as keywords, count(distinct(k2.name)) as keyWordsCount
However, I guess you may create the relationships between the related nodes directly :
MATCH (k:Keyword)
WITH k
LIMIT 1
MATCH (k)<-[:HAS]-(a)-[:HAS]->(other)
MERGE (k)-[r:RELATED_TO]->(other)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = r.weight + 1

neo4j cartesian product performance improvement

I have a Graph database with over 2 million nodes. I have an application which takes a social graph and does some inference on it. As one step of the algorithm, I have to get all possible combinations of a relationship [:friends] of two connected nodes. Currently, I have a query which looks like:
match (a)-[:friend]-(c), (b)-[:friend]-(d) where id(a)={ida} and id(b)={idb} return distinct c as first, d as second
So, I already know the nodes a and b and I want to get all the possible pairs that can be made from friends of a and b.
This is obviously a very slow operation. I was wondering if there is a more efficient way of getting the same result in neo4j. Perhaps adding indexes might help? Any ideas / clues are welcome!
Example
Node a has friends : x, y
Node b has friends : g, h, i``
Then the result should be:
x,g
x,h
x,i
y,g
y,h
y,i`
If you are not already you should use labels to speed up your query, which might look like:
MATCH (p1:Person)-[:FRIEND]->(p3:Person),(p2:Person)-[:FRIEND]->(p4:Person)
WHERE ID(p1) = 6 AND ID(p2) = 7
RETURN p3 as first, p4 as second
Obviously that will rely on you having created your nodes with a :Person label.
How many friends does the average node have?
I wouldn't use two patterns but just one and the IN operator.
MATCH (p:Person)-[:FRIEND]->(friend:Person)
WHERE id(p) IN [1,2,3]
RETURN p, collect(friend) as friends
Then you have no cross product and you can also return the friends nicely as collection per person.

Resources