Neo4j count based on single relationship to node - neo4j

I would like to find all persons who participated in all specified movies, for example in 2 movies: "The Terminator", "True Lies"
I have the following query:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH n, collect(n) as names
WHERE SIZE(names) >= 2
RETURN n.primaryName
which works fine if every person participated (:ACTS_BY relationship) only once in every movie. But according to my database schema design, every person can have 0-N :ACTS_BY relationships between Principal and Name nodes(for example the same person can be producer and actor of the movie at the same time).
The issue is that the mentioned Cypher query will also return the person(Name node) in case that person participated 2+ times in one movie and 0 times in another but I only need to return the Name node in case the person participated in each movie.
Please help to improve the query in order to achieve it.

To fix this, you'll want to get distinct values of t, p, n to weed out the duplicates, and only then do a count:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH DISTINCT t, p, n
WITH n, count(n) as occurrences
WHERE occurrences >= 2
RETURN n.primaryName

Related

What is missing in this Cypher query?

I'm learning Cypher and I created a 'Crime investigation' project on Neo4j.
I'm trying to return as an output the parent that only has two sons/daughters in total and each member of the family must have committed a crime.
So, in order to get this in the graph, I executed this query:
match(:Crime)<-[:PARTY_TO]-(p:Person)-[:FAMILY_REL]->(s:Person)-[:PARTY_TO]->(:Crime)
where size((p)-[:FAMILY_REL]->())=2
return p, s
FAMILY_REL relation shows the sons the Person (p) and PARTY_TO relation shows the Crime nodes a Person have committed.
The previous query it's not working as it should. It shows parents with more than two sons and also sons that have just one son.
What is wrong with the logic of the query?
SIZE((p)-[:FAMILY_REL]->()) counts all children of p, including ones who had committed no crimes.
This query should work better, as it only counts children who are criminals:
MATCH (:Crime)<-[:PARTY_TO]-(p:Person)-[:FAMILY_REL]->(s:Person)-[:PARTY_TO]->(:Crime)
WITH p, COLLECT(s) AS badKids
WHERE SIZE(badKids) = 2
RETURN p, badKids

Neo4j: query to find nodes with most relationship

I'm trying to find which movie has the most number of actors in it in my database.Here's what i came up with but it kept giving me blank.
MATCH (m:Movie)
WITH m, SIZE(()-[:ACTED_IN]->(m)) as actorCnt
MATCH (a)-[:ACTED_IN]->(m)
RETURN m, a
Maybe you did not wait long enough, because your query is trying to return all the actors for every movie.
This query should return a list of the actors for the (single) movie with the most actors:
MATCH (m:Movie)
WITH m
ORDER BY SIZE(()-[:ACTED_IN]->(m)) DESC
LIMIT 1
RETURN m, [(a)-[:ACTED_IN]->(m)|a] AS actors
It orders the movies by descending number of actors, takes just the first one, and returns it and a list of all its actors.

Using UNWIND on a list I created to return multiple values (Cypher)

I am using the "Movies" database in Neo4j to simplify my question (type :play movies in the query box of an empty sandbox). For a list of 3 actors that I specify, I want to determine the total number of movies they've worked on, the number of movies they've acted in, and the number of movies they've directed (if any). Here is what I came up with:
MATCH (p:Person)-->(m:Movie)
WITH p, m, count(m) AS total
MATCH (p)-[:ACTED_IN]->(m)
WITH p, m, total, count(DISTINCT m) AS actedIn
MATCH (p)-[:DIRECTED]->(m)
WITH p, m, total, actedIn, count(DISTINCT m) AS directed
UNWIND ["Tom Hanks", "Clint Eastwood", "Charlize Theron"] AS actors
RETURN DISTINCT actors, total, actedIn, directed
Currently, it is retuning that each actor acted in 1 movie and directed 1 movie, which is incorrect. I need to keep the WITH clauses in the query and I need to define the list of actors.
In the real query I am working on that compares to this simpler one, the same thing is happening where each element of the list I defined returns the same numbers as the other elements in the list. I am not sure what I am doing wrong here.
I think this query will work for you.
Since every person has been involved in a movie in some capacity the first MATCH can asser that and then the subsequent ones can be optional.
// Find the people that worked in total movies controlled by your list
MATCH (p:Person)-->(m:Movie)
WHERE p.name IN ["Tom Hanks", "Clint Eastwood", "Charlize Theron"]
// carry the people and the total movies per person
WITH p, count(m) AS total
// find the movies those people acted in
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
// carry the people, total movies and the movies acted in
WITH p, total, count(m) AS actedIn
// find the movies they directed
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN p.name, total, actedIn, count(m) AS directed

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?
We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a
Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

Resources