Neo4j Cypher- With clause query - neo4j

I'm doing some codes on the Neo4j's movies dataset the question was
Retrieve the actors who have acted in exactly five movies, returning the name of the actor, and the list of movies for that actor.
I wrote this following query and im not getting the result and it shows "no changes no result"
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a,m, count(m) AS numMovies
WHERE numMovies = 5
RETURN a.name,collect(m.title) AS movies
where as when I wrote this query for the same satement this time I just write the "collect(m.title) AS movies " in the WITH clause and I got the desired result.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(m) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies
My doubt is that why result varies when I wrote the "collect(m.title) AS movies" in the RETURN clause.

Your first query has m, count(m), which will result in a count of 1 for each Movie node m.
You can check this by returning from the query in the second line:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a, m, count(m) AS numMovies
The solution is to remove the separate m variable from the WITH clause as shown in your second query.

Related

why DISTINCT is needed in this Cypher query?

The below query is taken from neo4j movie review dataset sandbox:
MATCH (u:User {name: "Some User"})-[r:RATED]->(m:Movie)
WITH u, avg(r.rating) AS mean
MATCH (u)-[r:RATED]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE r.rating > mean
WITH u, g, COUNT(*) AS score
MATCH (g)<-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS((u)-[:RATED]->(rec))
RETURN rec.title AS recommendation, rec.year AS year, COLLECT(DISTINCT g.name) AS genres, SUM(score) AS sscore
ORDER BY sscore DESC LIMIT 10
what I can not understand is: why the DISTINCT keyword is required in the query's return statement?. Because the expected results from the last MATCH statement is something like this:
g1,x
g1,y
...
g2,z
g2,v
g2,m
...
gn,m
gn,b
gn,x
where g1,g2,..gn are the set of genres and x,y,z,v,m,b... are a set of movies (in addition there is a user and score column deleted for readability).
So according to my understanding what this query is returning: For each movie return its genres and the sum of their scores.
Assumptions:
Every Movie has a unique title. (This is required for the query to work as is.)
Every Genre has a unique name.
Every Movie has at most one IN_GENRE relationship to each distinct Genre.
Given the above assumptions, you are correct that the DISTINCT is not necessary. That is because the RETURN clause is using rec.title as one of the aggregation grouping keys.

Neo4j: query to find nodes with most relationship

I'm trying to find which movie has the most number of actors in it in my database.Here's what i came up with but it kept giving me blank.
MATCH (m:Movie)
WITH m, SIZE(()-[:ACTED_IN]->(m)) as actorCnt
MATCH (a)-[:ACTED_IN]->(m)
RETURN m, a
Maybe you did not wait long enough, because your query is trying to return all the actors for every movie.
This query should return a list of the actors for the (single) movie with the most actors:
MATCH (m:Movie)
WITH m
ORDER BY SIZE(()-[:ACTED_IN]->(m)) DESC
LIMIT 1
RETURN m, [(a)-[:ACTED_IN]->(m)|a] AS actors
It orders the movies by descending number of actors, takes just the first one, and returns it and a list of all its actors.

Using UNWIND on a list I created to return multiple values (Cypher)

I am using the "Movies" database in Neo4j to simplify my question (type :play movies in the query box of an empty sandbox). For a list of 3 actors that I specify, I want to determine the total number of movies they've worked on, the number of movies they've acted in, and the number of movies they've directed (if any). Here is what I came up with:
MATCH (p:Person)-->(m:Movie)
WITH p, m, count(m) AS total
MATCH (p)-[:ACTED_IN]->(m)
WITH p, m, total, count(DISTINCT m) AS actedIn
MATCH (p)-[:DIRECTED]->(m)
WITH p, m, total, actedIn, count(DISTINCT m) AS directed
UNWIND ["Tom Hanks", "Clint Eastwood", "Charlize Theron"] AS actors
RETURN DISTINCT actors, total, actedIn, directed
Currently, it is retuning that each actor acted in 1 movie and directed 1 movie, which is incorrect. I need to keep the WITH clauses in the query and I need to define the list of actors.
In the real query I am working on that compares to this simpler one, the same thing is happening where each element of the list I defined returns the same numbers as the other elements in the list. I am not sure what I am doing wrong here.
I think this query will work for you.
Since every person has been involved in a movie in some capacity the first MATCH can asser that and then the subsequent ones can be optional.
// Find the people that worked in total movies controlled by your list
MATCH (p:Person)-->(m:Movie)
WHERE p.name IN ["Tom Hanks", "Clint Eastwood", "Charlize Theron"]
// carry the people and the total movies per person
WITH p, count(m) AS total
// find the movies those people acted in
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
// carry the people, total movies and the movies acted in
WITH p, total, count(m) AS actedIn
// find the movies they directed
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN p.name, total, actedIn, count(m) AS directed

aggregate count changes when multiple returns

I am sending a cypher query through php.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return count(distinct n.id);
this returns the number of people who have watched movies in my collection.
I actually want to return the list of names, and return n.name works fine.
When I try to return n.name and count(distinct n.id) at the same time, I lose the total count and get the count per row.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return n.name, count(distinct n.id);
does not work. The count column appears as 1 for each row.
As I'm using php, I've also tried:
$count = $result->getNodesCount();
to no avail. So I'm using php to count the array. But it feels like Cypher should be able to do it, right?
return n.name, count(distinct n.name) means "return each distinct n.name value and its number of distinct values". The number must always be 1, since a distinct value is, obviously, distinct.
If you are actually looking for the number of times each person had an outgoing relationship to a movie whose title is in $mycollection, do this instead (where count(*) counts the number of times a given n.name was matched):
MATCH (n:person)-->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
Note that the above query omits the [watched] pattern found in your query, since that syntax (with no colon before watched) does no filtering at all. It merely assigns the relationship to a variable named watched, but that variable is not otherwise used, and is therefore superfluous.
If you had intended to use watched as the relationship type, then do this instead:
MATCH (n:person)-[:watched]->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
This modified query returns the number of times each person watched a movie whose title is in $mycollection

Use vars from before WITH statement in RETURN statement in Neo4j Cypher

I'm starting with Neo4j (v2.1.5) and I'm having an issue with the following Cypher query:
MATCH (actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH coactors, count(coactors) as TimesCoacted
RETURN coactors.name, avg(TimesCoacted)
ORDER BY avg(TimesCoacted) DESC
It is based on the mini movie graph which comes with Neo4j installation.
Everything works fine, it shows all coactors which coacted in movies with Tom Cruise and how many times they coacted together, but the problem occurs when I want to list in which movies they coacted. Placing 'movies' variable in RETURN statement is throwing the following error:
movies not defined (line 3, column 9)
"RETURN movies, coactors.name, avg(TimesCoacted)"
^
Is there any way I can do it in one query?
Try the following:
MATCH
(actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH
coactors,
count(coactors) as TimesCoacted,
movies // You have declare "movies" here in order to use it later in the query
RETURN
movies,
coactors.name,
avg(TimesCoacted)
ORDER BY
avg(TimesCoacted) DESC
What you define in the WITH-statement is the only thing that is available for further processing. In the original question the movies were not carried on to the next section (it was not part of WITH) and therefore movies could not be used in the return statement.
Edit: After updates from the OP the following was added.
Another example. If you wish to count the number of times the actors have coacted in a movie and list the movie-titles as well. Try the following:
MATCH
(actor:Person {name:"Tom Cruise"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(coactor:Person)
WITH
actor,
coactor,
collect (distinct movie.title) as movieTitles
RETURN
actor.name as actorName,
coactor.name as coactorName,
movieTitles,
size(movieTitles) as numberOfMovies
MATCH
(actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH
coactors,
count(coactors) as TimesCoacted,
collect(DISTINCT movies.title) as movies // <=- this line was crucial!
RETURN
movies,
coactors.name,
avg(TimesCoacted)
ORDER BY
avg(TimesCoacted) DESC

Resources