Is there a way to preform calculations in a cypher query? - neo4j

Is there a way to order by a calculated value based on relationships, rather than a label? For reference, I have a database containing users and skills. If applicable, each user node has a relationship with a skill node. Each skill has a specific value tied to it that represents how important that skill is. If a user wants to find similar users, what I am currently doing matching all distinct users with similar skills. What I want to do is sum up the values contained in each skill node that I'm looking for for a particular user, and sort from greatest to least. For example, if I'm looking for people that like to swim, run, and bike if Billy likes to swim and run I would take the values stored in each similar skill and sum them to use as the property to sort by. Is this possible in purely cypher, or would I have to return the list of results and then calculate/sort outside of cypher? If anyone has any other advice on how to better structure the database that would also be helpful.

This is pretty easy in Cypher and is for sure documented in a lot of places.
Here a couple of examples :
Finding users like Bob, based on similar skills, order by sum of skill importance on the skill node :
MATCH (n:User {name: 'Bob'})-[:HAS_SKILL]->(skill)<-[:HAS_SKILL]-(otherUser)
RETURN otherUser.name AS name, sum(skill.score) AS score
ORDER BY score DESC
In some graph models, each user can be associated a score to each skill, in which case the score would be on the relationship between the user and the skill, you can then sum up those as well :
MATCH (n:User {name: 'Bob'})-[r1:HAS_SKILL]->(skill)<-[r2:HAS_SKILL]-(otherUser)
RETURN otherUser.name AS name, sum(r1.score + r2.score) AS score
ORDER BY score DESC

Related

Cypher (Neo4j) - Find all relationships as long as one relationship from node satisfies a condition regardless of search path?

Excuse the bad title, I'm a beginner with Cypher and Graph databases in general. I'm not sure if the title fully captures what I am trying to ask, please let me know if you have any better titles!
I have a very simple graph setup with User nodes and Movie nodes and there exists a relationship from a User to a Movie called :REVIEWED that has a rating property that carries the users rating (1.0-5.0 inclusive). See the diagram below:
I think this design makes sense for a movie system for capturing user reviews. I don't think that reviews should exist as their own nodes because they are better represented as a relationship between the user (reviewer) and the movie in a graph. Not to mention the entire purpose properties can exist in relationships are to express scale/weight/metadata in a relationship and this is a great use case for them. However, due to this design I have been having a hard time coming up with a Cypher query to do the following:
Find the top ten movies with at least one review rating less than 3.
So that is, we want to sort the movies based on their average rating however at least one review must be less than a score of 3.0. The query I used to sort the movies based on their average rating is:
MATCH (movie:Movie)<-[review:REVIEWED]-(user:User)
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
This makes sense to me, however when I try to limit the path to reviews with a rating less than 3, see below:
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
RETURN movie.movieTitle, avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10
Only the paths that have relationships with a rating less than 3 get matched, which is what I should get. However, the issue is when we average the ratings it's only averaging the ratings less than 3.0.
Ideally we want to have all the reviews for that movie as long as there exists a review for that movie with a rating less than 3.0 regardless of whether it is in the matched path. This is where I am getting confused. Because Cypher uses patterns to match paths in the graph how can we use it to check all paths from a node and see if a condition is satisfied and then continue to match all paths based on that result.
Looking forward to hearing what you guys think, thanks in advance!
You need a two section query first match movies that have review score undere 3, then average their ratings,
MATCH (movie:Movie)<-[review:REVIEWED]-(:User)
WHERE review.rating < 3
WITH DISTINCT movie
MATCH (movie)<-[review:REVIEWED]-(:User)
RETURN avg(review.rating) as avgRating
ORDER BY avgRating DESC
LIMIT 10

Collaborative filtering cypher with attributes in neo4j

I am using neo4j to setup a recommender system. I have the following setup:
Nodes:
Users
Movies
Movie attributes (e.g. genre)
Relationships
(m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
(u:User)-[r:RATED {rating: 5}]->(m:Movie)
Here is a diagram of how it looks:
I am now trying to figure out how to apply a collaborative filtering scheme that works as follows:
Checks which attributes the user has liked (implicitly by liking the movies)
Find similar other users that have liked these similar attributes
Recommend the top movies to the user, which the user has NOT seen, but similar other users have seen.
The condition is obviously that each attribute has a certain weight for each movie. E.g. the genre adventure can have a weight of 10 for the Lord of Rings but a weight of 5 for the Titanic.
In addition, the system needs to take into account the ratings for each movies. E.g. if other user has rated Lord of the Rings 5, then his/her attributes of the Lord of Ranges are scaled by 5 and not 10. The user that has rated the implicit attributes also close to 5 should then get this movie recommended as opposed to another user that has rated similar attributes higher.
I made a start by simply recommending only other movies that other users have rated, but I am not sure how to take into account the relationships RATING and WEIGHT. It also did not work:
MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User),
(ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User)
WHERE user.uid = "user4"
AND NOT (user)-[:RATED]->(movie2)
RETURN oouser
What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar are they based on how many things they have in common. I say simplified because we are not taking into account the movies they disagree about. Essentially, and following your order, it would be:
1) Get the total weight of every Attribute for every user. For instance:
MATCH (user:User{name:'user1'})
OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute)
WITH user, r.rating * w.weight AS totalWeight, a
WITH user, a, sum(totalWeight) AS totalWeight
We need the last line because we had a row for each Movie-Attribute combination
2) Then, we get users with similar tastes. This is a performance danger zone, some filtering might be neccesary. But brute forcing it, we get users that like each attribute within an 10% error (for instance)
WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum
MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User)
WITH user, a, otherUser
WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum
WITH user, otherUser
So now we have a row (unique because of last line) with any otherUser that is a match. Here, to be honest, I would need to try to be sure if otherUsers with only 1 genre match would be included.. if they are, an additional filter would be needed. But I think that should go after we get this going.
3) Now it´s easy:
MATCH (otherUser)-[r:RATES]->(m:Movie)
WHERE NOT (user)-[:RATES]->(m)
RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC
As mentioned before, the tricky part is 2), but after we know how to get the math going, it should be easier. Oh, and about math, for it to work properly, total weights for a movie should sum 1 (normalizing). In any other case, the difference between total weights for movies would cause an unfair comparison.
I wrote this without proper studying (paper, pencil, equations, statistics) and trying the code in a sample dataset. I hope it can help you anyway!
In case you want this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in lines in 1) and 2) with just r.rating or w.weight, respectively. RATES and WEIGHTS relationships would still be used, so for instance an avid consumer of Adventure movies would be recommended Movies by consumers of Adventure movies, but not modified by ratings or by attribute weight, as we chose.
EDIT: Code edited to fix syntax errors discussed in comments.
Answer to your 1st query:
Checks which attributes the user has liked (implicitly by liking the movies)
MATCH (user:User)
OPTIONAL MATCH (user)-[r:RATED]->(m:movie)
OPTIONAL MATCH (m)-[r:RATED]->(a:Attribute)
WHERE user.uid = "user4"
RETURN user, collect ({ a:a.title })
It is a subquery construct where in you find the movies rated by the user and then find attributes of the movies and finally return list of liked attributes
you can modify return statement to collect (a) as attributes if you need entire node

Neo4j - User's search history based recommendation

I have Following labels :-
Tag
Genre
Actor
Director
Movie
User
UsersSerachHistory
My app user has search bar where they can type and search anything , I am storing users searched contents for a limited period of time for future recommendations.What will be the query for Recommendation based on Users search history? Should I need to create relationship of search history? After gone through some tutorial for recommendations , I am little bit able to write following query -
MATCH (m:Movie)<-[:LIKE]-(p:Person {personId : 1})
OPTIONAL MATCH (t:Tag)
WITH collect(distinct t.tagName) as c1
OPTIONAL MATCH (g:Genre)
WITH collect(distinct g.name) + c1 as c2
OPTIONAL MATCH (l:Language)
WITH collect(distinct l.languageName) + c2 as c3
RETURN c3
It is not a complete query but a rough idea and am not sure is it correct way? Can anybody help me to achieve it? Thanks much!!
Well with your current model, I assume you can do recommendations like :
Find people liking the same movies as you, what other movies do they
like that you didn't watched yet
MATCH (p:Person {personId: 1})-[:LIKE]->(movie)<-[:LIKE]-(other)
WITH distinct other, p
MATCH (other)-[:LIKE]->(reco)
WHERE NOT (p)-[:LIKE]->(reco)
RETURN reco, count(*) as score
ORDER BY score DESC
You can apply the same kind of queries for finding movies having the same genres, etc.. and combine the results afterwards.
There is a good blog post with lot of example queries for recommendations with Cypher : http://www.markhneedham.com/blog/2015/03/27/neo4j-generating-real-time-recommendations-with-cypher/
For recommendations based on search, an easy solution is to split the search string into elements, for example :
WITH split("action movie with arnold in 1997", " ") AS elements
MATCH (m:Movie)<-[r]-(object)
WHERE object.value IN elements
RETURN m, count(*) as score
This would assume that all nodes representing a property of a movie would have a common value property, so :Tag(value), :Year(value), :Title(value)
This is kind of basic, in common recommender systems based on search history, you would model the history like a timeline :
(user)-[:LAST_SEARCH]->(s)-[:PREV_SEARCH]->(s)..
(s)-[:HAS_KEYWORD]->(keyword)
Then you will compute the similarity between search histories continuously as a background job. A common algorithm is cosine similarity or likelihood function
You then can find potential similar searches and returned movies based on the similarity with the current user history and other user histories.
Lastly of course, you can combine all of the recommendation logic and compute a final score.
And based on your comment :
Users search keyword could be anything like movie title , actor name ,
tag etc.So for example if it is a tag name then I want to present
those movies having the same tag
This is more pattern matching and doesn't really fall in the topic of recommendations.

Calculating User Counts with Common Entities

I want to determine groups of users who have common interests.
Data Model and Characteristics
User and Interest are node labels and represent unique nodes
LIKES is the relationship among them, (User)-[:LIKES]->(Interest)
All properties of nodes are indexed
Relation nature can be characterized as many to many between the nodes
There are 300+ interests and 120,000+ users
I used the following query to determine user count with one common interest and all others;
MATCH (u:User)-[:LIKES]-(i:Interest)
WHERE i.name = "Baking"
WITH u
MATCH (u)-[:LIKES]-(i:Interest)
WHERE i.name <> "Baking"
RETURN i.name, COUNT(u) AS userCount
ORDER BY userCount DESC
I tried making a query which can have 3 common interests but that made it slower. I think this is not a good, scallable design, can anyone help?
Though maybe its not plausible but the end goal is to calculate nxn combinations of interests.
maybe you should limit the interests and only take the top five or something?
Also i don't know your data model but is the interest a unique node. That would speed up the query. So the relation [has interest]->( baking) points to the same node and you just can start from baking to get all the users.
Maybe flip your query and start from interest (cypher is strange) or you can force the query to use indexes

In Neo4j for every disjoint subgraph return the node with the most relationships

I’m new to Neo4j and graph theory and I’m trying to figure out if I can use Neo4j to solve a problem I have. Please correct me if I’m using the wrong words to describe stuff. Since I’m new to the subject I haven’t really wrapped my head around what to call everything.
I think the easiest way to describe my problem is with a lot of pictures.
Let’s say you have two disjoint subgraphs that look like this.
From the subgraphs above I want to get a list of subgraphs that fulfills one of two criteria.
Criteria 1.
If a node has a unique relationship to another node, the nodes and relationship should be returned as a subgraph.
Criteria 2.
If the relations are not unique, I'd like the node with the most relationships to be returned, as a subgraph with its relationships and related nodes.
If other nodes come in tie in criteria 2, I want all subgraphs to be returned.
Or put in the context of this graph,
Give me the people who have unique games, and if there are other people having the same games, give me back the person with the most games. If they come in tie, return all people who come in tie.
Or actually, return the whole subgraph, not only the person.
To clarify what I am after here is a picture that describes the result I want to get. The ordering of the result is not important.
Disjoint subgraph A, because of Criteria 1, Andrew is the only person who has Bubble Bobble.
Disjoint subgraph B, because of Criteria 1, Johan is the only person who has Puzzle Bobble 1.
Disjoint subgraph C, because of Criteria 2, Julia since she has the most games.
Disjoint subgraph D, because of Criteria 2, Anna since she comes in tie with Julia having the most games.
Worth noting is that Johan's relationship to Puzzle Bobble 2 is not returned because it's not unique and he has not the most games.
Is this a problem you could solve with only Neo4j and is it a good idea?
If you could solve it how would you do it in Cypher?
Create script:
CREATE (p1:Person {name:"Johan"}),
(p2:Person {name:"Julia"}),
(p3:Person {name:"Anna"}),
(p4:Person {name:"Andrew"}),
(v1:Videogame {name:"Puzzle Bobble 1"}),
(v2:Videogame {name:"Puzzle Bobble 2"}),
(v3:Videogame {name:"Puzzle Bobble 3"}),
(v4:Videogame {name:"Puzzle Bobble 4"}),
(v5:Videogame {name:"Bubble Bobble"}),
(p1)-[:HAS]->(v1),
(p1)-[:HAS]->(v2),
(p2)-[:HAS]->(v2),
(p2)-[:HAS]->(v3),
(p2)-[:HAS]->(v4),
(p3)-[:HAS]->(v2),
(p3)-[:HAS]->(v3),
(p3)-[:HAS]->(v4),
(p4)-[:HAS]->(v5)
I feel like this solution might not be quite what you're looking for, but it could be a good start:
MATCH (game:Videogame)<-[:HAS]-(owner:Person)
OPTIONAL MATCH owner-[:HAS]->(other_game:Videogame)
WITH game, owner, count(other_game) AS other_game_count
ORDER BY other_game_count DESC
RETURN game, collect(owner)[0]
Here the query:
Finds all of the games and their owners (games without owners will not be matched)
Does an OPTIONAL MATCH against any other games those owners might own (by doing an optional match we're saying that it's OK if they own zero)
Pass through each game/owner pair along with a count of the number of other games owned by that owner, sorting so that those with the most games come first
RETURN the first owner for each game (the ORDER is preserved when doing the collect)

Resources