I have Following labels :-
Tag
Genre
Actor
Director
Movie
User
UsersSerachHistory
My app user has search bar where they can type and search anything , I am storing users searched contents for a limited period of time for future recommendations.What will be the query for Recommendation based on Users search history? Should I need to create relationship of search history? After gone through some tutorial for recommendations , I am little bit able to write following query -
MATCH (m:Movie)<-[:LIKE]-(p:Person {personId : 1})
OPTIONAL MATCH (t:Tag)
WITH collect(distinct t.tagName) as c1
OPTIONAL MATCH (g:Genre)
WITH collect(distinct g.name) + c1 as c2
OPTIONAL MATCH (l:Language)
WITH collect(distinct l.languageName) + c2 as c3
RETURN c3
It is not a complete query but a rough idea and am not sure is it correct way? Can anybody help me to achieve it? Thanks much!!
Well with your current model, I assume you can do recommendations like :
Find people liking the same movies as you, what other movies do they
like that you didn't watched yet
MATCH (p:Person {personId: 1})-[:LIKE]->(movie)<-[:LIKE]-(other)
WITH distinct other, p
MATCH (other)-[:LIKE]->(reco)
WHERE NOT (p)-[:LIKE]->(reco)
RETURN reco, count(*) as score
ORDER BY score DESC
You can apply the same kind of queries for finding movies having the same genres, etc.. and combine the results afterwards.
There is a good blog post with lot of example queries for recommendations with Cypher : http://www.markhneedham.com/blog/2015/03/27/neo4j-generating-real-time-recommendations-with-cypher/
For recommendations based on search, an easy solution is to split the search string into elements, for example :
WITH split("action movie with arnold in 1997", " ") AS elements
MATCH (m:Movie)<-[r]-(object)
WHERE object.value IN elements
RETURN m, count(*) as score
This would assume that all nodes representing a property of a movie would have a common value property, so :Tag(value), :Year(value), :Title(value)
This is kind of basic, in common recommender systems based on search history, you would model the history like a timeline :
(user)-[:LAST_SEARCH]->(s)-[:PREV_SEARCH]->(s)..
(s)-[:HAS_KEYWORD]->(keyword)
Then you will compute the similarity between search histories continuously as a background job. A common algorithm is cosine similarity or likelihood function
You then can find potential similar searches and returned movies based on the similarity with the current user history and other user histories.
Lastly of course, you can combine all of the recommendation logic and compute a final score.
And based on your comment :
Users search keyword could be anything like movie title , actor name ,
tag etc.So for example if it is a tag name then I want to present
those movies having the same tag
This is more pattern matching and doesn't really fall in the topic of recommendations.
Related
Is there a way to order by a calculated value based on relationships, rather than a label? For reference, I have a database containing users and skills. If applicable, each user node has a relationship with a skill node. Each skill has a specific value tied to it that represents how important that skill is. If a user wants to find similar users, what I am currently doing matching all distinct users with similar skills. What I want to do is sum up the values contained in each skill node that I'm looking for for a particular user, and sort from greatest to least. For example, if I'm looking for people that like to swim, run, and bike if Billy likes to swim and run I would take the values stored in each similar skill and sum them to use as the property to sort by. Is this possible in purely cypher, or would I have to return the list of results and then calculate/sort outside of cypher? If anyone has any other advice on how to better structure the database that would also be helpful.
This is pretty easy in Cypher and is for sure documented in a lot of places.
Here a couple of examples :
Finding users like Bob, based on similar skills, order by sum of skill importance on the skill node :
MATCH (n:User {name: 'Bob'})-[:HAS_SKILL]->(skill)<-[:HAS_SKILL]-(otherUser)
RETURN otherUser.name AS name, sum(skill.score) AS score
ORDER BY score DESC
In some graph models, each user can be associated a score to each skill, in which case the score would be on the relationship between the user and the skill, you can then sum up those as well :
MATCH (n:User {name: 'Bob'})-[r1:HAS_SKILL]->(skill)<-[r2:HAS_SKILL]-(otherUser)
RETURN otherUser.name AS name, sum(r1.score + r2.score) AS score
ORDER BY score DESC
I am using neo4j to setup a recommender system. I have the following setup:
Nodes:
Users
Movies
Movie attributes (e.g. genre)
Relationships
(m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
(u:User)-[r:RATED {rating: 5}]->(m:Movie)
Here is a diagram of how it looks:
I am now trying to figure out how to apply a collaborative filtering scheme that works as follows:
Checks which attributes the user has liked (implicitly by liking the movies)
Find similar other users that have liked these similar attributes
Recommend the top movies to the user, which the user has NOT seen, but similar other users have seen.
The condition is obviously that each attribute has a certain weight for each movie. E.g. the genre adventure can have a weight of 10 for the Lord of Rings but a weight of 5 for the Titanic.
In addition, the system needs to take into account the ratings for each movies. E.g. if other user has rated Lord of the Rings 5, then his/her attributes of the Lord of Ranges are scaled by 5 and not 10. The user that has rated the implicit attributes also close to 5 should then get this movie recommended as opposed to another user that has rated similar attributes higher.
I made a start by simply recommending only other movies that other users have rated, but I am not sure how to take into account the relationships RATING and WEIGHT. It also did not work:
MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User),
(ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User)
WHERE user.uid = "user4"
AND NOT (user)-[:RATED]->(movie2)
RETURN oouser
What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar are they based on how many things they have in common. I say simplified because we are not taking into account the movies they disagree about. Essentially, and following your order, it would be:
1) Get the total weight of every Attribute for every user. For instance:
MATCH (user:User{name:'user1'})
OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute)
WITH user, r.rating * w.weight AS totalWeight, a
WITH user, a, sum(totalWeight) AS totalWeight
We need the last line because we had a row for each Movie-Attribute combination
2) Then, we get users with similar tastes. This is a performance danger zone, some filtering might be neccesary. But brute forcing it, we get users that like each attribute within an 10% error (for instance)
WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum
MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User)
WITH user, a, otherUser
WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum
WITH user, otherUser
So now we have a row (unique because of last line) with any otherUser that is a match. Here, to be honest, I would need to try to be sure if otherUsers with only 1 genre match would be included.. if they are, an additional filter would be needed. But I think that should go after we get this going.
3) Now it´s easy:
MATCH (otherUser)-[r:RATES]->(m:Movie)
WHERE NOT (user)-[:RATES]->(m)
RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC
As mentioned before, the tricky part is 2), but after we know how to get the math going, it should be easier. Oh, and about math, for it to work properly, total weights for a movie should sum 1 (normalizing). In any other case, the difference between total weights for movies would cause an unfair comparison.
I wrote this without proper studying (paper, pencil, equations, statistics) and trying the code in a sample dataset. I hope it can help you anyway!
In case you want this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in lines in 1) and 2) with just r.rating or w.weight, respectively. RATES and WEIGHTS relationships would still be used, so for instance an avid consumer of Adventure movies would be recommended Movies by consumers of Adventure movies, but not modified by ratings or by attribute weight, as we chose.
EDIT: Code edited to fix syntax errors discussed in comments.
Answer to your 1st query:
Checks which attributes the user has liked (implicitly by liking the movies)
MATCH (user:User)
OPTIONAL MATCH (user)-[r:RATED]->(m:movie)
OPTIONAL MATCH (m)-[r:RATED]->(a:Attribute)
WHERE user.uid = "user4"
RETURN user, collect ({ a:a.title })
It is a subquery construct where in you find the movies rated by the user and then find attributes of the movies and finally return list of liked attributes
you can modify return statement to collect (a) as attributes if you need entire node
So, i trying to build a basic recommendation system, i first get what the people who liked this movie also liked (collaborative filtring)(user based), then i get a chunk of various data (movies), because lets say people who liked toy story may also like SCI-fi movies. but movies of this type is irrelative to toy story very much, so i want to filter the results again by its genres, toy story has 5 genres (Animation, Action, Adventure, etc) i want to only get movies which have share these genres in common.
this my cypher query
match (x)<-[:HAS_GENRE]-(ee:Movie{id:1})<-[:RATED{rating: 5}]
-(usr)-[:RATED{rating: 5}]->(another_movie)<-[:LINK]-(l2:Link),
(another_movie)-[:HAS_GENRE]->(y:Genre)
WHERE ALL (m IN x.name WHERE m IN y.name)
return distinct y.name, another_movie, l2.tmdbId limit 200
the first record i get back is star wars 1977, which has only Adventure genre matches toy story genres.. help me writing better cypher
There are a few things we can do to improve the query.
Collecting the genres should allow for the correct WHERE ALL clause later. We can also hold off on matching to the recommended movie's Link node until we filter down to the movies we want to return.
Give this one a try:
MATCH (x)<-[:HAS_GENRE]-(ee:Movie{id:1})
// collect genres so only one result row so far
WITH ee, COLLECT(x) as genres
MATCH (ee)<-[:RATED{rating: 5}]-()-[:RATED{rating: 5}]->(another_movie)
WITH genres, DISTINCT another_movie
// don't match on genre until previous query filters results on rating
MATCH (another_movie)-[:HAS_GENRE]->(y:Genre)
WITH genres, another_movie, COLLECT(y) as gs
WHERE size(genres) <= size(gs) AND ALL (genre IN genres WHERE genre IN gs)
WITH another_movie limit 200
// only after we limit results should we match to the link
MATCH (another_movie)<-[:LINK]-(l2:Link)
RETURN another_movie, l2.tmdbId
As movies are likely going to have many many ratings, the match to find movies both rated 5 is going to be the most expensive part of the query. If many of your queries rely on a rating of 5, you may want to consider creating a separate [:MAX_RATED] relationship whenever a user rates a movie a 5, and use those [:MAX_RATED] relationships for queries like these. That ensures that you don't initially match to a ton of rated movies that all have to be filtered by their rating value.
Alternately, if you want to consider recommendations based on average ratings for movies, you may want to consider caching a computed average of all ratings for every movie (maybe the computation gets rerun for all movies a couple times a day). If you add an index on the average rating property on movie nodes, it should provide faster matching to movies that are rated similarly.
I am trying to write a query which looks for potential friends in a Neo4j db based on common friends and interests.
I don't want to post the whole query (part of school assignment), but this is the important part
MATCH (me:User {firstname: "Name"}), (me)-[:FRIEND]->(friend:User)<-[:FRIEND]-(potential:User), (me)-[:MEMBER]->(i:Interest)
WHERE NOT (potential)-[:FRIEND]->(me)
WITH COLLECT(DISTINCT potential) AS potentialFriends,
COLLECT(DISTINCT friend) AS friends,
COLLECT(i) as interests
UNWIND potentialFriends AS potential
/*
#HANDLING_FINDINGS
Here I count common friends, interests and try to find relationships between
potential friends too -- hence the collect/unwind
*/
RETURN potential,
commonFriends,
commonInterests,
(commonFriends+commonInterests) as totalPotential
ORDER BY totalPotential DESC
LIMIT 10
In the section #HANDLING_FINDINGS I use the found potential friends to find relationships between each other and calculate their potential (i.e. sum of shared friends and common interests) and then order them by potential.
The problem is that there might be users with no friends whom I would also like to recommend someone friends.
My question - can I somehow insert a few random users into the "potential" findings if their count is below 10 so that everyone gets a recommendation?
I have tried something like this
...
UNWIND potentialFriends AS potential
CASE
WHEN (count(potential) < 10 )
...
But that produced an error as soon as it hit start of the CASE. I think that case can be used only as part of a command like return? (maybe just return)
Edit with 2nd related question:
I was already thinking of matching all users and then ranking them based on common friends/interestes, but wouldn't searching through the whole DB be intensive?
A CASE expression can be used wherever a value is needed, but it cannot be used as a complete clause.
With respect to your main question, you can put a WITH clause like the following between your existing WITH and UNWIND clauses:
WITH friends, interests,
CASE WHEN SIZE(potentialFriends) < 10 THEN {randomFriends} ELSE potentialFriends END AS potentialFriends
If the size of the potentialFriends collection is less than 10, the CASE expression assigns the value of the {randomFriends} parameter to potentialFriends.
As for your second question, yes it would be expensive.
I'm a Cypher newbie so I might be missing something obvious but after reading all the basic recommendation engine posts/tutorials I could find, I can't seem to be able to solve this so all help is appreciated.
I'm trying to make a recommendation function that recommends Places to User based on Tags from previous Places she enjoyed. User have LIKES relationship to Tag which carries weight property. Places have CONTAINS relationship with Tag but Contain doesn't have any weights associated with it. Also the more Tags with LIKES weighted above certain threshold (0.85) Place has, the higher it should be ordered so this would add SUM aggregator.
(User)-[:LIKES]->(Tag)<-[:CONTAINS]-(Place)
My problem is that I can't wrap my head around how to order Places based on the amount of Tags pointing to it that have LIKES relationship with User and then how to use LIKES weights to order Places.
Based on the following example neo4j console : http://console.neo4j.org/r/klmu5l
The following query should do the trick :
MATCH (n:User {login:'freeman.williamson'})-[r:LIKES]->(tag)
MATCH (place:Place)-[:CONTAINS]->(tag)
WITH place, sum(r.weight) as weight, collect(tag.name) as tags
RETURN place, size(tags) as rate, weight
ORDER BY rate DESC, weight DESC
Which returns :
(42:Place {name:"Alveraville"}) 6 491767416
(38:Place {name:"Raynorshire"}) 5 491766715
(45:Place {name:"North Kristoffer"}) 5 491766069
(36:Place {name:"Orrinmouth"}) 5 491736638
(44:Place {name:"New Corachester"}) 5 491736001
Explanation :
I match the user and the tags he likes
I match the places containing at least one tag he likes
Then I use WITH to pipe the sum of the rels weights, a collection of the tags, and the place
Then I return those except I count with size the number of tags in the collection
All ordered in descending order