So, i trying to build a basic recommendation system, i first get what the people who liked this movie also liked (collaborative filtring)(user based), then i get a chunk of various data (movies), because lets say people who liked toy story may also like SCI-fi movies. but movies of this type is irrelative to toy story very much, so i want to filter the results again by its genres, toy story has 5 genres (Animation, Action, Adventure, etc) i want to only get movies which have share these genres in common.
this my cypher query
match (x)<-[:HAS_GENRE]-(ee:Movie{id:1})<-[:RATED{rating: 5}]
-(usr)-[:RATED{rating: 5}]->(another_movie)<-[:LINK]-(l2:Link),
(another_movie)-[:HAS_GENRE]->(y:Genre)
WHERE ALL (m IN x.name WHERE m IN y.name)
return distinct y.name, another_movie, l2.tmdbId limit 200
the first record i get back is star wars 1977, which has only Adventure genre matches toy story genres.. help me writing better cypher
There are a few things we can do to improve the query.
Collecting the genres should allow for the correct WHERE ALL clause later. We can also hold off on matching to the recommended movie's Link node until we filter down to the movies we want to return.
Give this one a try:
MATCH (x)<-[:HAS_GENRE]-(ee:Movie{id:1})
// collect genres so only one result row so far
WITH ee, COLLECT(x) as genres
MATCH (ee)<-[:RATED{rating: 5}]-()-[:RATED{rating: 5}]->(another_movie)
WITH genres, DISTINCT another_movie
// don't match on genre until previous query filters results on rating
MATCH (another_movie)-[:HAS_GENRE]->(y:Genre)
WITH genres, another_movie, COLLECT(y) as gs
WHERE size(genres) <= size(gs) AND ALL (genre IN genres WHERE genre IN gs)
WITH another_movie limit 200
// only after we limit results should we match to the link
MATCH (another_movie)<-[:LINK]-(l2:Link)
RETURN another_movie, l2.tmdbId
As movies are likely going to have many many ratings, the match to find movies both rated 5 is going to be the most expensive part of the query. If many of your queries rely on a rating of 5, you may want to consider creating a separate [:MAX_RATED] relationship whenever a user rates a movie a 5, and use those [:MAX_RATED] relationships for queries like these. That ensures that you don't initially match to a ton of rated movies that all have to be filtered by their rating value.
Alternately, if you want to consider recommendations based on average ratings for movies, you may want to consider caching a computed average of all ratings for every movie (maybe the computation gets rerun for all movies a couple times a day). If you add an index on the average rating property on movie nodes, it should provide faster matching to movies that are rated similarly.
Related
am new to neo4j and I want to know how to query for example, in movies dataset,"Which movie has the most number of actors in it?" and Which pair of actors have acted together in most number of movies?
Below questions are not new. You just need to know how to find them.
Which movie has the most number of actors in it?
Find the movie with the largest cast, out of the list of movies that have a review
Which pair of actors have acted together in most number of movies?
Pair of Actors with Most Occurences
Excuse the bad title, I'm a beginner with Cypher and Graph databases in general. I'm not sure if the title fully captures what I am trying to ask, please let me know if you have any better titles!
I have a very simple graph setup with User nodes and Movie nodes and there exists a relationship from a User to a Movie called :REVIEWED that has a rating property that carries the users rating (1.0-5.0 inclusive). See the diagram below:
I think this design makes sense for a movie system for capturing user reviews. I don't think that reviews should exist as their own nodes because they are better represented as a relationship between the user (reviewer) and the movie in a graph. Not to mention the entire purpose properties can exist in relationships are to express scale/weight/metadata in a relationship and this is a great use case for them. However, due to this design I have been having a hard time coming up with a Cypher query to do the following:
Find the top ten movies with at least one review rating less than 3.
So that is, we want to sort the movies based on their average rating however at least one review must be less than a score of 3.0. The query I used to sort the movies based on their average rating is:
MATCH (movie:Movie)<-[review:REVIEWED]-(user:User)
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
This makes sense to me, however when I try to limit the path to reviews with a rating less than 3, see below:
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
Only the paths that have relationships with a rating less than 3 get matched, which is what I should get. However, the issue is when we average the ratings it's only averaging the ratings less than 3.0.
Ideally we want to have all the reviews for that movie as long as there exists a review for that movie with a rating less than 3.0 regardless of whether it is in the matched path. This is where I am getting confused. Because Cypher uses patterns to match paths in the graph how can we use it to check all paths from a node and see if a condition is satisfied and then continue to match all paths based on that result.
Looking forward to hearing what you guys think, thanks in advance!
You need a two section query first match movies that have review score undere 3, then average their ratings,
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
WITH DISTINCT movie
MATCH (movie)<-[review:REVIEWED]-(:User)
RETURN avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
I am using neo4j to setup a recommender system. I have the following setup:
Nodes:
Users
Movies
Movie attributes (e.g. genre)
Relationships
(m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
(u:User)-[r:RATED {rating: 5}]->(m:Movie)
Here is a diagram of how it looks:
I am now trying to figure out how to apply a collaborative filtering scheme that works as follows:
Checks which attributes the user has liked (implicitly by liking the movies)
Find similar other users that have liked these similar attributes
Recommend the top movies to the user, which the user has NOT seen, but similar other users have seen.
The condition is obviously that each attribute has a certain weight for each movie. E.g. the genre adventure can have a weight of 10 for the Lord of Rings but a weight of 5 for the Titanic.
In addition, the system needs to take into account the ratings for each movies. E.g. if other user has rated Lord of the Rings 5, then his/her attributes of the Lord of Ranges are scaled by 5 and not 10. The user that has rated the implicit attributes also close to 5 should then get this movie recommended as opposed to another user that has rated similar attributes higher.
I made a start by simply recommending only other movies that other users have rated, but I am not sure how to take into account the relationships RATING and WEIGHT. It also did not work:
MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User),
(ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User)
WHERE user.uid = "user4"
AND NOT (user)-[:RATED]->(movie2)
RETURN oouser
What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar are they based on how many things they have in common. I say simplified because we are not taking into account the movies they disagree about. Essentially, and following your order, it would be:
1) Get the total weight of every Attribute for every user. For instance:
MATCH (user:User{name:'user1'})
OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute)
WITH user, r.rating * w.weight AS totalWeight, a
WITH user, a, sum(totalWeight) AS totalWeight
We need the last line because we had a row for each Movie-Attribute combination
2) Then, we get users with similar tastes. This is a performance danger zone, some filtering might be neccesary. But brute forcing it, we get users that like each attribute within an 10% error (for instance)
WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum
MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User)
WITH user, a, otherUser
WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum
WITH user, otherUser
So now we have a row (unique because of last line) with any otherUser that is a match. Here, to be honest, I would need to try to be sure if otherUsers with only 1 genre match would be included.. if they are, an additional filter would be needed. But I think that should go after we get this going.
3) Now it´s easy:
MATCH (otherUser)-[r:RATES]->(m:Movie)
WHERE NOT (user)-[:RATES]->(m)
RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC
As mentioned before, the tricky part is 2), but after we know how to get the math going, it should be easier. Oh, and about math, for it to work properly, total weights for a movie should sum 1 (normalizing). In any other case, the difference between total weights for movies would cause an unfair comparison.
I wrote this without proper studying (paper, pencil, equations, statistics) and trying the code in a sample dataset. I hope it can help you anyway!
In case you want this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in lines in 1) and 2) with just r.rating or w.weight, respectively. RATES and WEIGHTS relationships would still be used, so for instance an avid consumer of Adventure movies would be recommended Movies by consumers of Adventure movies, but not modified by ratings or by attribute weight, as we chose.
EDIT: Code edited to fix syntax errors discussed in comments.
Answer to your 1st query:
Checks which attributes the user has liked (implicitly by liking the movies)
MATCH (user:User)
OPTIONAL MATCH (user)-[r:RATED]->(m:movie)
OPTIONAL MATCH (m)-[r:RATED]->(a:Attribute)
WHERE user.uid = "user4"
RETURN user, collect ({ a:a.title })
It is a subquery construct where in you find the movies rated by the user and then find attributes of the movies and finally return list of liked attributes
you can modify return statement to collect (a) as attributes if you need entire node
I have Following labels :-
Tag
Genre
Actor
Director
Movie
User
UsersSerachHistory
My app user has search bar where they can type and search anything , I am storing users searched contents for a limited period of time for future recommendations.What will be the query for Recommendation based on Users search history? Should I need to create relationship of search history? After gone through some tutorial for recommendations , I am little bit able to write following query -
MATCH (m:Movie)<-[:LIKE]-(p:Person {personId : 1})
OPTIONAL MATCH (t:Tag)
WITH collect(distinct t.tagName) as c1
OPTIONAL MATCH (g:Genre)
WITH collect(distinct g.name) + c1 as c2
OPTIONAL MATCH (l:Language)
WITH collect(distinct l.languageName) + c2 as c3
RETURN c3
It is not a complete query but a rough idea and am not sure is it correct way? Can anybody help me to achieve it? Thanks much!!
Well with your current model, I assume you can do recommendations like :
Find people liking the same movies as you, what other movies do they
like that you didn't watched yet
MATCH (p:Person {personId: 1})-[:LIKE]->(movie)<-[:LIKE]-(other)
WITH distinct other, p
MATCH (other)-[:LIKE]->(reco)
WHERE NOT (p)-[:LIKE]->(reco)
RETURN reco, count(*) as score
ORDER BY score DESC
You can apply the same kind of queries for finding movies having the same genres, etc.. and combine the results afterwards.
There is a good blog post with lot of example queries for recommendations with Cypher : http://www.markhneedham.com/blog/2015/03/27/neo4j-generating-real-time-recommendations-with-cypher/
For recommendations based on search, an easy solution is to split the search string into elements, for example :
WITH split("action movie with arnold in 1997", " ") AS elements
MATCH (m:Movie)<-[r]-(object)
WHERE object.value IN elements
RETURN m, count(*) as score
This would assume that all nodes representing a property of a movie would have a common value property, so :Tag(value), :Year(value), :Title(value)
This is kind of basic, in common recommender systems based on search history, you would model the history like a timeline :
(user)-[:LAST_SEARCH]->(s)-[:PREV_SEARCH]->(s)..
(s)-[:HAS_KEYWORD]->(keyword)
Then you will compute the similarity between search histories continuously as a background job. A common algorithm is cosine similarity or likelihood function
You then can find potential similar searches and returned movies based on the similarity with the current user history and other user histories.
Lastly of course, you can combine all of the recommendation logic and compute a final score.
And based on your comment :
Users search keyword could be anything like movie title , actor name ,
tag etc.So for example if it is a tag name then I want to present
those movies having the same tag
This is more pattern matching and doesn't really fall in the topic of recommendations.
I want to determine groups of users who have common interests.
Data Model and Characteristics
User and Interest are node labels and represent unique nodes
LIKES is the relationship among them, (User)-[:LIKES]->(Interest)
All properties of nodes are indexed
Relation nature can be characterized as many to many between the nodes
There are 300+ interests and 120,000+ users
I used the following query to determine user count with one common interest and all others;
MATCH (u:User)-[:LIKES]-(i:Interest)
WHERE i.name = "Baking"
WITH u
MATCH (u)-[:LIKES]-(i:Interest)
WHERE i.name <> "Baking"
RETURN i.name, COUNT(u) AS userCount
ORDER BY userCount DESC
I tried making a query which can have 3 common interests but that made it slower. I think this is not a good, scallable design, can anyone help?
Though maybe its not plausible but the end goal is to calculate nxn combinations of interests.
maybe you should limit the interests and only take the top five or something?
Also i don't know your data model but is the interest a unique node. That would speed up the query. So the relation [has interest]->( baking) points to the same node and you just can start from baking to get all the users.
Maybe flip your query and start from interest (cypher is strange) or you can force the query to use indexes