am new to neo4j and I want to know how to query for example, in movies dataset,"Which movie has the most number of actors in it?" and Which pair of actors have acted together in most number of movies?
Below questions are not new. You just need to know how to find them.
Which movie has the most number of actors in it?
Find the movie with the largest cast, out of the list of movies that have a review
Which pair of actors have acted together in most number of movies?
Pair of Actors with Most Occurences
Related
I'm learning Neo4j and I see that some Match clause can retrieve multiple times the same node, so you need to specify DISTINCT to eliminate duplicates being nodes or aggregated values as for
MATCH (p:Person)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d:Person)
collect(DISTINCT m.title) as movies
RETURN p.name as Actor, movies AS Movies, d.name AS Director
I'm wondering in what cases would one want to keep duplicates.
Many thanks
There are not any specific cases, in which you want duplicates. It basically depends on the functionality you are trying to achieve.
Consider this scenario: We want all the movie titles to which a person is linked somehow directly. In this case, you'll probably use DISTINCT, because a Person can be linked both as an Actor and Director to a movie. The query will be:
MATCH (p:Person)-[]->(m)
WITH p, collect(DISTINCT m.title) as movies
RETURN p.name as Actor, movies AS Movies
In another scenario, you just want the movies a person is linked to as an actor. In this case, there is no need to use DISTINCT, because a person will be linked to a movie as an Actor, ideally only once. So this would suffice:
MATCH (p:Person)-[:ACTED_IN]->(m)
WITH p, collect(m.title) as movies
RETURN p.name as Actor, movies AS Movies
Mostly aggregation operations are the places, where we use DISTINCT to remove duplicates, in counts, lists, etc. You can also use DISTINCT to remove duplicate rows from the output if there are any, but again it's functionality dependent, there are no hard and fast rules as such. If the query you are trying returns duplicates, and you don't want them, use DISTINCT, otherwise, let it be as it is.
I have a model in Graph theory of Movies and Actors. The relation between the two is "requires". The graph is shown below. A and B are Movies. 1,2,3,4,5,6 are actors. Movie A requires 1,2,3,4 actors. Movie B requires 4,5,6 actors. We can see that 4 is shared among both movies.
Current Query:
MATCH (m :Movie) -[r :require]-> (a :actor)
RETURN m,r,a;
Current Output
Expected Output
I want to display something as below. Here actor 4 is shown once for each movie. Can someone help me fix this ?
The visualization logic in Neo4j Browser displays each node only once, therefore you cannot get node 4 two times.
A workaround would be using the neo4j APOC library and return virtual nodes as copies of the blue nodes instead of the real node. If you create two virtual nodes out of node 4, the UI considers them being distinct and therefore show two nodes.
You can do that my grouping the actors by movie and end up with a row of movie, [list of actors ]
MATCH (m:Movie)-[r:require]->(a :actor)
WITH m collect(a) as actors
return m,actors
Excuse the bad title, I'm a beginner with Cypher and Graph databases in general. I'm not sure if the title fully captures what I am trying to ask, please let me know if you have any better titles!
I have a very simple graph setup with User nodes and Movie nodes and there exists a relationship from a User to a Movie called :REVIEWED that has a rating property that carries the users rating (1.0-5.0 inclusive). See the diagram below:
I think this design makes sense for a movie system for capturing user reviews. I don't think that reviews should exist as their own nodes because they are better represented as a relationship between the user (reviewer) and the movie in a graph. Not to mention the entire purpose properties can exist in relationships are to express scale/weight/metadata in a relationship and this is a great use case for them. However, due to this design I have been having a hard time coming up with a Cypher query to do the following:
Find the top ten movies with at least one review rating less than 3.
So that is, we want to sort the movies based on their average rating however at least one review must be less than a score of 3.0. The query I used to sort the movies based on their average rating is:
MATCH (movie:Movie)<-[review:REVIEWED]-(user:User)
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
This makes sense to me, however when I try to limit the path to reviews with a rating less than 3, see below:
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
Only the paths that have relationships with a rating less than 3 get matched, which is what I should get. However, the issue is when we average the ratings it's only averaging the ratings less than 3.0.
Ideally we want to have all the reviews for that movie as long as there exists a review for that movie with a rating less than 3.0 regardless of whether it is in the matched path. This is where I am getting confused. Because Cypher uses patterns to match paths in the graph how can we use it to check all paths from a node and see if a condition is satisfied and then continue to match all paths based on that result.
Looking forward to hearing what you guys think, thanks in advance!
You need a two section query first match movies that have review score undere 3, then average their ratings,
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
WITH DISTINCT movie
MATCH (movie)<-[review:REVIEWED]-(:User)
RETURN avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
My question is nearly same as Finding triplets having highest common relationships in Neo4j
But the answer provided is not solving the purpose.
My database has two types of nodes. Movies and Actors.
I want to find the most frequent pairs of co-actors for a given actor. So, if an actor has acted in 100 movies, I want to find most common pairs of co-actors in those 100 movies.
Database type
(:Movie)-[:STARS]->(:Actor)
I can get the most frequent co-actors in a list of descending order by frequency following the CQL
MATCH (actor1:Actor)<-[st1:STARS]-(mv1:Movie)-[st2:STARS]->(actor2:Actor)
WHERE actor1.name = "Tom"
RETURN actor1,actor2,count(mv1)
ORDER BY count(mv1) DESC
mentioned in the question itself. But the answer posted is not returning me with pairs of co-actors
The answer I am looking for is as follows
Actor1 Actor2 Actor3 Count(Movie)
A B C 20
A B D 10
This table can hypothetically imply that A has starred in 25 unique movies and in those movies B and C are together as well in 20 and B and D are together in 10
So, i trying to build a basic recommendation system, i first get what the people who liked this movie also liked (collaborative filtring)(user based), then i get a chunk of various data (movies), because lets say people who liked toy story may also like SCI-fi movies. but movies of this type is irrelative to toy story very much, so i want to filter the results again by its genres, toy story has 5 genres (Animation, Action, Adventure, etc) i want to only get movies which have share these genres in common.
this my cypher query
match (x)<-[:HAS_GENRE]-(ee:Movie{id:1})<-[:RATED{rating: 5}]
-(usr)-[:RATED{rating: 5}]->(another_movie)<-[:LINK]-(l2:Link),
(another_movie)-[:HAS_GENRE]->(y:Genre)
WHERE ALL (m IN x.name WHERE m IN y.name)
return distinct y.name, another_movie, l2.tmdbId limit 200
the first record i get back is star wars 1977, which has only Adventure genre matches toy story genres.. help me writing better cypher
There are a few things we can do to improve the query.
Collecting the genres should allow for the correct WHERE ALL clause later. We can also hold off on matching to the recommended movie's Link node until we filter down to the movies we want to return.
Give this one a try:
MATCH (x)<-[:HAS_GENRE]-(ee:Movie{id:1})
// collect genres so only one result row so far
WITH ee, COLLECT(x) as genres
MATCH (ee)<-[:RATED{rating: 5}]-()-[:RATED{rating: 5}]->(another_movie)
WITH genres, DISTINCT another_movie
// don't match on genre until previous query filters results on rating
MATCH (another_movie)-[:HAS_GENRE]->(y:Genre)
WITH genres, another_movie, COLLECT(y) as gs
WHERE size(genres) <= size(gs) AND ALL (genre IN genres WHERE genre IN gs)
WITH another_movie limit 200
// only after we limit results should we match to the link
MATCH (another_movie)<-[:LINK]-(l2:Link)
RETURN another_movie, l2.tmdbId
As movies are likely going to have many many ratings, the match to find movies both rated 5 is going to be the most expensive part of the query. If many of your queries rely on a rating of 5, you may want to consider creating a separate [:MAX_RATED] relationship whenever a user rates a movie a 5, and use those [:MAX_RATED] relationships for queries like these. That ensures that you don't initially match to a ton of rated movies that all have to be filtered by their rating value.
Alternately, if you want to consider recommendations based on average ratings for movies, you may want to consider caching a computed average of all ratings for every movie (maybe the computation gets rerun for all movies a couple times a day). If you add an index on the average rating property on movie nodes, it should provide faster matching to movies that are rated similarly.