I need to get user-based recommendations with graphaware and I don't know how to do that. As far as I can see, all I seem to get from graphaware's neo4j-reco are item-similarities as in 'people who bought a also bought b'. But what I'm interested in is user-based recommendations like 'recommended for you, based on your previous purchases'. Any idea how to do that?
GraphAware-Reco is mainly a skeleton helping you build enterprise-grade recommendation engines atop a neo4j database.
This means that it provides base classes and an architecture that you need to extend yourself with your own logic.
If you take your requirements, here purchase history, a very naive approach to get started with is for example to find the characteristics of the products purchased.
Lets say user 1 purchased an iphone and an ipad, that can have those characteristics :
iphone brand : apple, category: electronics
ipad brand: apple, category: electronics
You can create a first engine that will match potential candidates based on those characteristics, this engine will extend the CypherEngine with the following query :
MATCH (n:User {id: 111})-[:PURCHASED]->(product)
WITH distinct product
MATCH (product)-[:HAS_CHARACTERISTIC]->(c)<-[:HAS_CHARACTERISTIC]-(reco)
RETURN reco, count(*) AS score
Another approach you can combine with this one is to find people having bought the same items as the user and find what they also bought, you will then create another engine with the following query :
MATCH (n:User {id: 111})-[:PURCHASED]->(product)
WITH distinct product, user
MATCH (product)<-[:PURCHASED]-(collab)
WHERE collab <> user
MATCH (collab)-[:PURCHASED]->(reco)
RETURN reco, count(*) AS score
When using those two engines, GraphAware Reco will automatically combine the scores from each engine into one.
You can find an example of a CypherEngine in the tests : https://github.com/graphaware/neo4j-reco/blob/master/src/test/java/com/graphaware/reco/neo4j/engine/CypherEngineTest.java
You can also add a blacklist for not recommending items the user already bought.
As I said, this is a first step, if you have a big catalog with lot of purchases, you might consider doing background computations (for eg, similarity between products and only relate top k-nn products between them and same for purchases and related similar users between them)
GraphAware-Reco offers you facilities for having background computation jobs and GraphAware-Reco-Enterprise comes with pre-defined algorithms for the similarity computations between items as well as an Apache Spark integration for moving the similarity computation process outside of the neo4j jvm and write back the results/relationships to neo4j (not open-sourced)
Related
I am working on a dating app where users can "like" or "dislike" other users and get matched.
As you can imagine the most important query of the app would be:
Give me a stack of nearby user profiles that I have NOT liked/disliked before.
I tried to work on this with a document database (Firestore) and figured it's simply not suitable for such kind of application and hence landed in the graph database world which is new and fascinating to me.
I understand that by nature a graph database retrieves data by tracing through the relationships and make relationships first-class citizens. My question now is that what if the nodes that I am trying to get are those with no relationship from the given node? What would the query look like? Can anyone provide an example query?
Edit:
- added nearby criteria to the query statement
This is definitely possible, here is a query example :
MATCH (me:Profile {name: "Chris"})
MATCH (other:Profile) WHERE NOT (other)-[:LIKES]->(me)
As stated in the comments of your original question, on a large dataset it might not scale well, that said it is pretty uncommon that you would use only one criteria for matching, for example, the list of possible profiles to match from can be grouped by :
geolocation
profiles in depth 2 ( who is liking me, then find who other people they like, do those people like me ? )
shared interests
age group
skin color
...
My database contains hotels, reviews of hotels, terms (i.e. words) in reviews and topics (e.g. there could be a topic talking "Staff" containing terms describing the hotel staff) as nodes. Indices on all nodes are present. Relationships as follows: Hotel<--Review-->Term-->Topic
I am currently trying to find an efficient way of querying for topics that have paths to two or more specified hotels. In other words, I am interested in the common topics of two hotels. If hotel A has paths to topics 1,2,3 and hotel B has paths to topics 2,3,4 then the result should be 2,3.
I tried the following below but this seems very inefficient which is very likely due to the amount of possible paths between hotels and topics. Basically each word in a review could create a new path that has to be checked.
// show all topics that two hotels have in common
MATCH (h2:Hotel)<--(r2:Review)-->(t2:Term)-->(to:Topic)<--(t1:Term)<--(r1:Review)-->(h1:Hotel)
WHERE h1.id IN ["id1","id2"] AND h2.id IN ["id1","id2"] AND NOT h1.id=h2.id
RETURN h1.id,to.topic, count(to) AS topic_mentions
I am wondering if there's a faster way of dealing with this, if I were to implement this in java or similar language I'd probably try doing a BFS starting at each hotel and then taking the overlap of what I find. I am fairly certain that adding the transitive edges as direct edges Hotel-->Topic would speed this up, but my limited database design knowledge told me that this might be unnecessarily redundant and not a good practice?
I tried to do the id matching before the pattern matching with another MATCH and WITH clause, but this didnt speed up anything; I think the problem really lies in the pattern matching itself.
I created something similar for searching KB's, and a direct relationship between Hotels and Topics will make this search dead easy, and it'll be faster. For example, your search for all topics with more than one Hotel in common, you'd use:
MATCH (h1:Hotel)-[:TOPIC]->(t:Topic)
MATCH (h2:Hotel)-[:TOPIC]->(t:Topic)
WHERE h1 <> h2
RETURN h1.id, h2.id, t.topic, count(t) AS topic_mentions
Note that this will return a count of all topics these two hotels have in common, which may or may not be what you want.
I am fairly certain that adding the transitive edges as direct edges
Hotel--Topic would speed this up, but my limited database design
knowledge told me that this might be unnecessarily redundant and not a
good practice?
All that would be doing is making an implicit relationship explicit, which is one of things that make graph db's so powerful. There is the maintenance aspect to be concerned about - namely if someone updates the words in a review, then you have to make sure that the (hotel)-[:TOPIC]->(topic) relationships are still valid - but you'd have to do that in your original design anyway, so no loss there.
i need to implement a suggestion system for my project
in this system we should recommend people base on some parameters like current city, education, friend of friends etc.
i have designed this by creating(update) may_know relations when users edit their profile or become friend with someone and i will retrieve them by MATCH u-[r:MAY_KNOW]-x RETURN * ORDER BY r.weight so people can find most like people to them
but i think this is not a best practice because soon may_know relation from/to every user can reach even milions and scan and sorting them will be heavy cost
do you have a better idea?
Depends a bit on the data-structure, I assume there are relationships to cities, education facilities and friends. So you don't actually have MAY_KNOW relationships as those are only inferred?
Also it depends if you want to create a cross products between all your users (how many) and how you would want to filter out non-related people.
Perhaps check out this blog post from Max: http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/
So something like this query might work (depending on the data volume I'd rewrite it in the Java API).
match (p:Person {id:{user_id})
match (p)-[:LIVES_IN]->(:City)<-[:LIVES_IN]-(other)
match (p)-[:GRADUATED]->(:School)<-[:GRADUATED]-(other)
match (p)-[:KNOWS]->(:Person)<-[:KNOWS]-(other)
RETURN other
I'm building a simple twitter graph and I'm having a discussion with other members of the team about how to implement it.
I think that the A option is the best due to performance and simplicity, but other members(the project leader specially prefers C) aren't convinced due to inexperience with the platform.
I've never used a graph in a production application so I don't have a strong argument when the PL starts comparing it to the C option.
So I ask you which option would you choose based on your experience?
Elements:
Twitterusers
Lists
Interests
A) neo4j graph
Nodes:
twitteruser
list
interest
Relations:
follows(user1, user2)
member_of(user, list)
interested_in(user, interest)
B) Same graph, but splitted in smaller graphs to increase performance.
C) Simple neo4j graph and a relational db to query the data.
.graph:
Nodes:
twitteruser
Relations:
similarity(user1, user2)
.relational db: the nodes of A will translate to tables and the
relationships will be done through many to many keys.
From what you described I personally don't see any reason for options B and C. The scenario you describe looks perfect for a graph DB as Neo4j.
If you choose option C, you'll have a lot of code that is only doing id translation and synchronization between the two databases. You'd better have a good reason for using two stores like this.
I am currently using Solr 1.4 (soon to upgrade to 3.3). The friendship table is pretty standard:
id | follower_id | user_id
I would like to perform a regular keyword solr search and order the results by degrees of separation as well as the standard score ordering. From the result set, given the keyword matched any of my immediate friends, they would show up first. Secondly would be the friends of my friends, and thirdly friends by 3rd degree of separation. All other results would come after.
I am pretty sure Solr doesn't offer any 'pre-baked' way of doing this therefore I would likely have to do a join on MySQL to properly order the results. Curious if anyone has done this before and/or has some insights.
It's simply not possible in Solr. However, if you aren't too restricted and could use another platform for this, consider neo4j?
This "connections" and degrees is exactly where Neo4j steps in.
http://neo4j.org/
One way might be to create fields like degree_1, degree_2 etc. and store the list of friends at degree x in the field degree_x. Then you could fire multiple queries - the first restricting the results to those who have you in degree_1, the second restricting the results to those who have you in degree_2 and so on.
It is a bit complicated, but the only solution I could think of using Solr.
I haven't represented a graph in solr before, but I think at a high level, this is what you could do. First, represent people as nodes and the social network as a graph in the database. Implement transitive closure function in sql to allow you to walk the graph. Then you would index the result into solr with the social network info stored into payloads, for example.
I was able to achieve this by performing multiple queries and with the scope "with" to restrict to the id's of colleagues, 2nd and 3rd degree colleagues, using the id's and using mysql to do the select.
#search_1 = perform_search(1, options)
#search_2 = perform_search(2, options)
if degree == 1
with(:id).any_of(options[:colleague_ids])
elsif degree == 2
with(:id).any_of(options[:second_degree_colleagues])
end
It's kinda of a dirty solution as I have to perform multiple solr queries, but until I can use dynamic field sorting options (solr 3.3, not currently supported by sunspot) I really don't know any other way to achieve this.