I am new to Mahout, and am still playing around with it.
My question is, is it appropriate to combine Item-Item and User-Item?
My use case is, a social networking application will try to recommend something for the current user based on user historical data (with higher priority), and combine the recommendation results from the current user's friends historical data (with lower priority), and display the result with ordered rating list.
The reason is, for example a new user might not have much historical data in the system, we can recommend something from his friends historical data. Once the user accumulate enough historical data, the recommendation should be based on more on that.
Is it appropriate to design system in this way?
Thank you for your time,
George
This is fairly simple to write. You can create recommendations for the user, and then combine with recommendations for the other users. A dumb version of this logic would be to add: merge lists of recommendations by adding the scores for items that appear in both lists. Maybe you add N friends' recs together, and then add N times the user's own recs. You take recommendations from this list then.
This doesn't exist in the project per se but it's quite easy to write a method to do this on the List<RecommendedItem> that comes back from recommend().
Related
I am trying to use collaborative filtering to recommend items to the user based on their past purchase. I have created a user vector representing his usage and item vector(A) with values populated as probabilty of B given A. The objective to somewhat capture the items sold together in items vector representation. Now I need to find the time when these recommendations should be presented. As the items I am recommending are of periodic use timing is very important.
So I am trying to explore constraint-based Recommendations to make my recommendation time sensitive. The approach I am thinking is to create time-sensitive constraint based on the last date of purchase and average consumption rate. But the problem is creating constraint as user level will become computationally difficult.
I need your suggestion regarding the approach or suggestion of any better way to implement the same. All I want to develop a recommendation engine using customer's usage data of items that are consumed and required to purchase again. I need to output list of recommendation as well as timing of presenting the recommendation to the user
Thanks
The way I see it, there are two basic options here that you can pursue. On the one hand, the temporal features can be incorporated as additional information and converted into a kind of hybrid recommendation. The Python package "lightfm" is a good example.
On the other hand, the problem can also be modeled as a time series problem. A well-known paper dealing with Next Basket Recommendations is "A Dynamic Recurrent Model for Next Basket Recommendation". Here too, there are already implementations on Github.
Can someone please help me clarify.
I am currently using collaborative filtering (ALS) which returns a recommendation list with scores corresponding to the recommended items. In addition to this, I am boosting the scores (+0.1) if the items contain a tag that corresponds with what the user has specified they prefer such as "romantic movies". To me, this is considered a hybrid collaborative approach since it's boosting the Collaborative filtering results with content-based filtering (Please correct me if I am wrong).
Now, what if I did the same approach without doing Collaborative filtering? would it be considered Content-based Filtering? since I will be still recommending dishes based on the content and attributes of each dish corresponding to what the user has specified they like (such as "romantic movies").
The reason why I'm confused is because I've seen content-based filtering where they apply an algorithm such as Naive Bayes etc, and this approach would be similar to a simple search of the items (on the contents).
Not sure you can do what you suggest because you have no score to boost without CF.
You are indeed using a hybrid, much the same as the Universal Recommender. To do purely content-based recommendations you have to implement two methods
Personalized recommendations: here you have to look at the content of items the user preferred and find items that have similar content. This can be done by using something like the Mahout spark-rowsimilarity job to create a model of item: list-of-similar-items then indexing the results with a search engine and using the user's preferred item ids as the query. This is being added to the Universal Recommender.
"People who liked this also liked these": these are items similar to one being viewed, for example, and are the same for all users. They are not personalized and so are useful even for anonymous users with no history. This can be done with the same indexed ids as above but using the items similar to the one being viewed as the query. One might think to use only the similar items themselves but by using them as a query you can put the categorical boost in the search engine query and have boosted items returned. This already works in the Universal Recommender but the similar items are not in the model yet.
That said mixing content with collaborative-filtering will almost surely give better results since CF works better when the data is available. The only time to rely on content-based recommendations is when your catalog is of one-off items, which never get enough CF interactions or you have rich content, which has a short lifetime like breaking news.
BTW anyone who wants to help add the pure content-based part to the Universal Recommender can contact the new maintainers of it at ActionML.com
I'm trying develop a trust-aware collaborative filtering approach. I have two epinions datasets. One with who trusts who: <ID_truster, ID_trusted>. And one with ratings: <ID_truster, ITEM, RATING>.
How can I make recommendations (User-User based) using only ratings from people who I trust?
At the moment I only make recommendations using the second dataset, taking in consideration every user.
Thank you
The closest thing I can think of is to use a user-neighborhood-based approach, and only include trusted users in the neighborhood. You would need to write some extra code for that, to disqualify untrusted users, by returning a very negative similarity value for them. Look at the UserSimilarity interface.
I've created a graph model for a social network and needed some concrete advice regarding the design in regards to scaling. Pardon the n00bness of these questions but I'm not finding very many clear examples out there...
NOTE: the status updates and activity nodes /relationships are linked lists - with the newest entries constantly being placed at the top of the list.
Linked lists allow for news feed generation, but there could be hundreds of records per user - I presume the limit clause isn't sufficient even though the data is in descending order by date. Do I have to have a separate linked list that would only hold the most recent 10 status/activity updates) and constantly replace the head on that list to get better activity feed generation, or will one list properly sorted and do the job (with a limit clause)
These nodes all have properties (json data with content, IDs, etc) - how do "global" indexes come into play here so that I can find, for example, users that like Depeche Mode without waiting a lifetime for results? I know how to add a node to an index, just wondering if I'm missing a part of the picture here..
Security - logins and passwords.. I would presume a graph database could store them, but I'd presume it's a security risk at this point - would it be better to keep this in postgres etc?
How would you improve this model to handle scalability? Imagine 20 million users banging away on this..
Imagine 40 million users - what's wrong with this model when it comes to scalability?
Part 1.
You can write cypher or gremlin queries that do what you want. Remember that you can traverse forwards and backwards on edges. Given a user, it should always be relatively constant time to pull up the last ten things they did.
Part 2.
If you are representing a band as an entity of a certain type, index on that attribute. Then you'll be able to pull out that node and traverse outwards to find all the users who like that band. If you don't have an independent entity, or it is somehow implicit, you'll want to enable full text search for your respective graph database.
Part 3.
Learn more about security. The only thing you would be storing would be a properly hashed string of the user's password. At that point you would be fine using any graph db and good security practices.
Part 4/5.
Once you have one user, worry about the next thousand.
When you have a thousand users, worry about the next hundred thousand.
When you have one hundred thousand, worry about the next million.
When you have a million users, you can start worrying about the questions you asked.
Until you have at least 0.1% of the users/volume you want to scale to, it's mental masturbation to try and ask questions about how to scale up to a certain size.
Suppose a user buys n items from my website; I need an algorithm or a method (using Mahout maybe? How?) so that I can recommend k similar items to the user. I don't have user ratings.
The k recommendations need to be based upon the user's buying history (his n items).
The items have fields "name","author","keywords" for example, I need to recommend the most similar items. What happens if I add user ratings along with this? How would I take that into account?
I've read the Mahout docs, but it seems to always need some sort of ratings. How will I provide ratings, if say, I have had only a couple of customers so far?
There is no perfect way to build a recommender.
Recommendations without user ratings
Calculate the item-item similarity according to the keywords, name and author. Then you can propose the most similar items not seen yet. As items don't change often, you can store the similarity-table somewhere.
Recommendations with user ratings
If you don't want to have user ratings, you could also store the view-history of a user. This results in a "boolean" rating (only having "seen" and "not seen"). With this pseudo-rating, you can generate recommendations with user-similarity. Users having seen similar things are similar.
For some lecture, I strongly recommend you the book Mahout in Action. It contains a lot of information about how to use Mahout.