I'm a Cypher newbie so I might be missing something obvious but after reading all the basic recommendation engine posts/tutorials I could find, I can't seem to be able to solve this so all help is appreciated.
I'm trying to make a recommendation function that recommends Places to User based on Tags from previous Places she enjoyed. User have LIKES relationship to Tag which carries weight property. Places have CONTAINS relationship with Tag but Contain doesn't have any weights associated with it. Also the more Tags with LIKES weighted above certain threshold (0.85) Place has, the higher it should be ordered so this would add SUM aggregator.
(User)-[:LIKES]->(Tag)<-[:CONTAINS]-(Place)
My problem is that I can't wrap my head around how to order Places based on the amount of Tags pointing to it that have LIKES relationship with User and then how to use LIKES weights to order Places.
Based on the following example neo4j console : http://console.neo4j.org/r/klmu5l
The following query should do the trick :
MATCH (n:User {login:'freeman.williamson'})-[r:LIKES]->(tag)
MATCH (place:Place)-[:CONTAINS]->(tag)
WITH place, sum(r.weight) as weight, collect(tag.name) as tags
RETURN place, size(tags) as rate, weight
ORDER BY rate DESC, weight DESC
Which returns :
(42:Place {name:"Alveraville"}) 6 491767416
(38:Place {name:"Raynorshire"}) 5 491766715
(45:Place {name:"North Kristoffer"}) 5 491766069
(36:Place {name:"Orrinmouth"}) 5 491736638
(44:Place {name:"New Corachester"}) 5 491736001
Explanation :
I match the user and the tags he likes
I match the places containing at least one tag he likes
Then I use WITH to pipe the sum of the rels weights, a collection of the tags, and the place
Then I return those except I count with size the number of tags in the collection
All ordered in descending order
Related
Excuse the bad title, I'm a beginner with Cypher and Graph databases in general. I'm not sure if the title fully captures what I am trying to ask, please let me know if you have any better titles!
I have a very simple graph setup with User nodes and Movie nodes and there exists a relationship from a User to a Movie called :REVIEWED that has a rating property that carries the users rating (1.0-5.0 inclusive). See the diagram below:
I think this design makes sense for a movie system for capturing user reviews. I don't think that reviews should exist as their own nodes because they are better represented as a relationship between the user (reviewer) and the movie in a graph. Not to mention the entire purpose properties can exist in relationships are to express scale/weight/metadata in a relationship and this is a great use case for them. However, due to this design I have been having a hard time coming up with a Cypher query to do the following:
Find the top ten movies with at least one review rating less than 3.
So that is, we want to sort the movies based on their average rating however at least one review must be less than a score of 3.0. The query I used to sort the movies based on their average rating is:
MATCH (movie:Movie)<-[review:REVIEWED]-(user:User)
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
This makes sense to me, however when I try to limit the path to reviews with a rating less than 3, see below:
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
Only the paths that have relationships with a rating less than 3 get matched, which is what I should get. However, the issue is when we average the ratings it's only averaging the ratings less than 3.0.
Ideally we want to have all the reviews for that movie as long as there exists a review for that movie with a rating less than 3.0 regardless of whether it is in the matched path. This is where I am getting confused. Because Cypher uses patterns to match paths in the graph how can we use it to check all paths from a node and see if a condition is satisfied and then continue to match all paths based on that result.
Looking forward to hearing what you guys think, thanks in advance!
You need a two section query first match movies that have review score undere 3, then average their ratings,
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
WITH DISTINCT movie
MATCH (movie)<-[review:REVIEWED]-(:User)
RETURN avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
I am using neo4j to setup a recommender system. I have the following setup:
Nodes:
Users
Movies
Movie attributes (e.g. genre)
Relationships
(m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
(u:User)-[r:RATED {rating: 5}]->(m:Movie)
Here is a diagram of how it looks:
I am now trying to figure out how to apply a collaborative filtering scheme that works as follows:
Checks which attributes the user has liked (implicitly by liking the movies)
Find similar other users that have liked these similar attributes
Recommend the top movies to the user, which the user has NOT seen, but similar other users have seen.
The condition is obviously that each attribute has a certain weight for each movie. E.g. the genre adventure can have a weight of 10 for the Lord of Rings but a weight of 5 for the Titanic.
In addition, the system needs to take into account the ratings for each movies. E.g. if other user has rated Lord of the Rings 5, then his/her attributes of the Lord of Ranges are scaled by 5 and not 10. The user that has rated the implicit attributes also close to 5 should then get this movie recommended as opposed to another user that has rated similar attributes higher.
I made a start by simply recommending only other movies that other users have rated, but I am not sure how to take into account the relationships RATING and WEIGHT. It also did not work:
MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User),
(ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User)
WHERE user.uid = "user4"
AND NOT (user)-[:RATED]->(movie2)
RETURN oouser
What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar are they based on how many things they have in common. I say simplified because we are not taking into account the movies they disagree about. Essentially, and following your order, it would be:
1) Get the total weight of every Attribute for every user. For instance:
MATCH (user:User{name:'user1'})
OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute)
WITH user, r.rating * w.weight AS totalWeight, a
WITH user, a, sum(totalWeight) AS totalWeight
We need the last line because we had a row for each Movie-Attribute combination
2) Then, we get users with similar tastes. This is a performance danger zone, some filtering might be neccesary. But brute forcing it, we get users that like each attribute within an 10% error (for instance)
WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum
MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User)
WITH user, a, otherUser
WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum
WITH user, otherUser
So now we have a row (unique because of last line) with any otherUser that is a match. Here, to be honest, I would need to try to be sure if otherUsers with only 1 genre match would be included.. if they are, an additional filter would be needed. But I think that should go after we get this going.
3) Now it´s easy:
MATCH (otherUser)-[r:RATES]->(m:Movie)
WHERE NOT (user)-[:RATES]->(m)
RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC
As mentioned before, the tricky part is 2), but after we know how to get the math going, it should be easier. Oh, and about math, for it to work properly, total weights for a movie should sum 1 (normalizing). In any other case, the difference between total weights for movies would cause an unfair comparison.
I wrote this without proper studying (paper, pencil, equations, statistics) and trying the code in a sample dataset. I hope it can help you anyway!
In case you want this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in lines in 1) and 2) with just r.rating or w.weight, respectively. RATES and WEIGHTS relationships would still be used, so for instance an avid consumer of Adventure movies would be recommended Movies by consumers of Adventure movies, but not modified by ratings or by attribute weight, as we chose.
EDIT: Code edited to fix syntax errors discussed in comments.
Answer to your 1st query:
Checks which attributes the user has liked (implicitly by liking the movies)
MATCH (user:User)
OPTIONAL MATCH (user)-[r:RATED]->(m:movie)
OPTIONAL MATCH (m)-[r:RATED]->(a:Attribute)
WHERE user.uid = "user4"
RETURN user, collect ({ a:a.title })
It is a subquery construct where in you find the movies rated by the user and then find attributes of the movies and finally return list of liked attributes
you can modify return statement to collect (a) as attributes if you need entire node
I want to determine groups of users who have common interests.
Data Model and Characteristics
User and Interest are node labels and represent unique nodes
LIKES is the relationship among them, (User)-[:LIKES]->(Interest)
All properties of nodes are indexed
Relation nature can be characterized as many to many between the nodes
There are 300+ interests and 120,000+ users
I used the following query to determine user count with one common interest and all others;
MATCH (u:User)-[:LIKES]-(i:Interest)
WHERE i.name = "Baking"
WITH u
MATCH (u)-[:LIKES]-(i:Interest)
WHERE i.name <> "Baking"
RETURN i.name, COUNT(u) AS userCount
ORDER BY userCount DESC
I tried making a query which can have 3 common interests but that made it slower. I think this is not a good, scallable design, can anyone help?
Though maybe its not plausible but the end goal is to calculate nxn combinations of interests.
maybe you should limit the interests and only take the top five or something?
Also i don't know your data model but is the interest a unique node. That would speed up the query. So the relation [has interest]->( baking) points to the same node and you just can start from baking to get all the users.
Maybe flip your query and start from interest (cypher is strange) or you can force the query to use indexes
I've got a data model that I want to roughly model after the article posted in this Graphgist.
I'm curious as to the performance that I can expect on the WHERE clause in the case that a given set of 2 nodes has a large number of relationships between them with 'from' and 'to' parameters defined on each edge. When you do a match query like this where you have let's say 100 SELLS relationships, how does Neo4J handle performance of filtering down the edges to just the one(s) that matter based on the WHERE criteria:
MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000)
RETURN p.product_id AS productId,
ps.name AS product,
ps.price AS price
ORDER BY price DESC
I haven't found a way to index properties on an edge directly so I'm assuming that either the query optimizer can take care of something like this or it just literally traverses the array of edges and finds the one(s) that match.
Neo4j will just traverse all relationships and read the property value. There are by default no indexes on relationships properties (this can be achieved with the legacy indexes : check documentation).
Concerning performance, bear in mind that Neo4j is very fast at traversing relationships so while your query is "very expensive", Neo4j can traverse 2 to 4 million relationships per second and per core depending on your hardware configuration.
So, to summarize, for 100 relationships it will run like a flash, but it is not optimized at all currently, so you'll see some drawbacks if you need to run the same operations on 1million relationships for example.
So I'm still learning how to access relationship parameters, I've found several ways to access different aspects of what I'm looking for but can't seem to piece it together.
neo4j cypher - how to find all nodes that have a relationship to list of nodes
I am getting closer, but can't figure out how to sum a collection and check for length.
MATCH (album:Album)-[r]->(tags:Word)
WHERE tags.name IN ['alpha', 'bravo']
WITH album, COLLECT(tags) as tags, COLLECT(r.weight) as weight
RETURN album, tags, weight
Thank you in advance.
Ok so I found a solution to avoid the COLLECTION issue, apparently anywhere that you collect you can also sum. I'm learning though and enjoying the process!
MATCH (album:Album)-[r]->(tags:Word)
WHERE tags.name IN ['alpha', 'bravo']
WITH album, COLLECT(tags) as tags, SUM(r.weight) as weight
WHERE LENGTH(tags) = LENGTH(['alpha', 'bravo'])
RETURN album, tags, weight ORDER BY weight ASC LIMIT 10;