Neo4j: query to find 2 node with the most - neo4j

I'm trying to find which pair of actors have acted together in most number of movies in my data base and my query kept returning blank, any suggestions?
MATCH (actor1:Actor)<-[st1:ACTED_IN]-(mv1:Movie)-[st2:ACTED_IN]->(actor2:Actor)
RETURN distinct actor1,actor2,count(mv1)

Looks like you have written relationship arrows in reverse direction.
It might be from Actor to Movie:
MATCH (actor1:Actor)-[st1:ACTED_IN]->(mv1:Movie)<-[st2:ACTED_IN]-(actor2:Actor)
RETURN distinct actor1, actor2, count(mv1)
Though you are using distinct your query will return duplicates because the following two are different records:
actor1, actor2, movie_count
actor2, actor1, movie_count
To get rid of this duplicate entries you can use simple trick of comparing ids of nodes like:
MATCH (actor1:Actor)-[:ACTED_IN]->(mv1:Movie)<-[:ACTED_IN]-(actor2:Actor)
WHERE id(actor1)>id(actor2)
RETURN actor1,actor2,count(mv1)
To find actors acted in most movies:
MATCH (actor1:Actor)-[:ACTED_IN]->(mv1:Movie)<-[:ACTED_IN]-(actor2:Actor)
WHERE id(actor1)>id(actor2)
RETURN actor1,actor2,count(mv1) AS movie_count
ORDER BY movie_count DESC
LIMIT 1

Related

why DISTINCT is needed in this Cypher query?

The below query is taken from neo4j movie review dataset sandbox:
MATCH (u:User {name: "Some User"})-[r:RATED]->(m:Movie)
WITH u, avg(r.rating) AS mean
MATCH (u)-[r:RATED]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE r.rating > mean
WITH u, g, COUNT(*) AS score
MATCH (g)<-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS((u)-[:RATED]->(rec))
RETURN rec.title AS recommendation, rec.year AS year, COLLECT(DISTINCT g.name) AS genres, SUM(score) AS sscore
ORDER BY sscore DESC LIMIT 10
what I can not understand is: why the DISTINCT keyword is required in the query's return statement?. Because the expected results from the last MATCH statement is something like this:
g1,x
g1,y
...
g2,z
g2,v
g2,m
...
gn,m
gn,b
gn,x
where g1,g2,..gn are the set of genres and x,y,z,v,m,b... are a set of movies (in addition there is a user and score column deleted for readability).
So according to my understanding what this query is returning: For each movie return its genres and the sum of their scores.
Assumptions:
Every Movie has a unique title. (This is required for the query to work as is.)
Every Genre has a unique name.
Every Movie has at most one IN_GENRE relationship to each distinct Genre.
Given the above assumptions, you are correct that the DISTINCT is not necessary. That is because the RETURN clause is using rec.title as one of the aggregation grouping keys.

Neo4j: query to find nodes with most relationship

I'm trying to find which movie has the most number of actors in it in my database.Here's what i came up with but it kept giving me blank.
MATCH (m:Movie)
WITH m, SIZE(()-[:ACTED_IN]->(m)) as actorCnt
MATCH (a)-[:ACTED_IN]->(m)
RETURN m, a
Maybe you did not wait long enough, because your query is trying to return all the actors for every movie.
This query should return a list of the actors for the (single) movie with the most actors:
MATCH (m:Movie)
WITH m
ORDER BY SIZE(()-[:ACTED_IN]->(m)) DESC
LIMIT 1
RETURN m, [(a)-[:ACTED_IN]->(m)|a] AS actors
It orders the movies by descending number of actors, takes just the first one, and returns it and a list of all its actors.

Need help figuring out the number of times 2 actors acted together in the most number of movies (cypher)

I wrote the command below to list out pairs of actors that have acted together in different movies and lists the number of times the pairs acted together.
match (person1)-[:ACTED_IN]->(Movie)<-[:ACTED_IN]-(person2)
return DISTINCT Movie.title, person1.name, person2.name, count(Movie) AS pairs
ORDER BY pairs DESC
I feel it's inaccurate though as its counting duplicate instances, ie- x and y are being counted as one instance and y and x are being counted as a separate instance. Does anyone know how to fix this?
Any help would be appreciated.
Yes, you can add some logic for this, for example :
MATCH (person1)-[:ACTED_IN]->(Movie)<-[:ACTED_IN]-(person2)
WHERE id(person1) < id(person2)
RETURN DISTINCT Movie.title,
person1.name,
person2.name,
count(Movie) AS pairs
ORDER BY pairs DESC
To properly see the movies that every pair of actors appeared in (in descending order of shared movies), this should work:
MATCH (p1:Person)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(p2:Person)
RETURN p1.name, p2.name, COLLECT(m) AS movies, COUNT(m) AS pairs
ORDER BY pairs DESC
This query takes into account the fact that aggregating functions (like COUNT and COLLECT) use the non-aggregating elements in the same WITH or RETURN clause as "grouping keys". Also, DISTINCT is not needed, since aggregating functions implicitly return distinct rows (if there are any grouping keys).

Cypher query help: Order query results by content of property array

I have a bunch of venues in my Neo4J DB. Each venue object has the property 'catIds' that is an array and contains the Ids for the type of venue it is. I want to query the database so that I get all Venues but they are ordered where their catIds match or contain some off a list of Ids that I give the query. I hope that makes sense :)
Please, could someone point me in the direction of how to write this query?
Since you're working in a graph database you could think about modeling your data in the graph, not in a property where it's hard to get at it. For example, in this case you might create a bunch of (v:venue) nodes and a bunch of (t:type) nodes, then link them by an [:is] relation. Each venue is linked to one or more type nodes. Each type node has an 'id' property: {id:'t1'}, {id:'t2'}, etc.
Then you could do a query like this:
match (v:venue)-[r:is]->(:type) return v, count(r) as n order by n desc;
This finds all your venues, along with ALL their type relations and returns them ordered by how many type-relations they have.
If you only want to get nodes of particular venue types on your list:
match (v:venue)-[r:is]-(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
And if you want ALL venues but rank ordered according to how well they fit your list, as I think you were looking for:
match (v:venue) optional match (v)-[r:is]->(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
The match will get all your venues; the optional match will find relations on your list if the node has any. If a node has no links on your list, the optional match will fail and return null for count(r) and should sort to the bottom.

Use vars from before WITH statement in RETURN statement in Neo4j Cypher

I'm starting with Neo4j (v2.1.5) and I'm having an issue with the following Cypher query:
MATCH (actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH coactors, count(coactors) as TimesCoacted
RETURN coactors.name, avg(TimesCoacted)
ORDER BY avg(TimesCoacted) DESC
It is based on the mini movie graph which comes with Neo4j installation.
Everything works fine, it shows all coactors which coacted in movies with Tom Cruise and how many times they coacted together, but the problem occurs when I want to list in which movies they coacted. Placing 'movies' variable in RETURN statement is throwing the following error:
movies not defined (line 3, column 9)
"RETURN movies, coactors.name, avg(TimesCoacted)"
^
Is there any way I can do it in one query?
Try the following:
MATCH
(actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH
coactors,
count(coactors) as TimesCoacted,
movies // You have declare "movies" here in order to use it later in the query
RETURN
movies,
coactors.name,
avg(TimesCoacted)
ORDER BY
avg(TimesCoacted) DESC
What you define in the WITH-statement is the only thing that is available for further processing. In the original question the movies were not carried on to the next section (it was not part of WITH) and therefore movies could not be used in the return statement.
Edit: After updates from the OP the following was added.
Another example. If you wish to count the number of times the actors have coacted in a movie and list the movie-titles as well. Try the following:
MATCH
(actor:Person {name:"Tom Cruise"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(coactor:Person)
WITH
actor,
coactor,
collect (distinct movie.title) as movieTitles
RETURN
actor.name as actorName,
coactor.name as coactorName,
movieTitles,
size(movieTitles) as numberOfMovies
MATCH
(actor:Person{name:"Tom Cruise"})-[role:ACTED_IN]->(movies)<-[r:ACTED_IN]-(coactors)
WITH
coactors,
count(coactors) as TimesCoacted,
collect(DISTINCT movies.title) as movies // <=- this line was crucial!
RETURN
movies,
coactors.name,
avg(TimesCoacted)
ORDER BY
avg(TimesCoacted) DESC

Resources