I'm learning Recommender systems. I have user's age, profession, gender etc. available in the dataset, along with their interaction scores with the available videos. I also have certain characteristics about each video like genre, #views etc. I am thinking of applying collaborative filtering for this particular dataset, but most examples online ignore this information whereas these characteristics can help understand why a certain user rated a certain video with the rating he gave, and can help produce more accurate predictions. Are there some algorithms which take this information into account?
Related
So I'm working on a project for my University where our current plan is to use the YouTube API and do some data analysis. We have some ideas since we're looking at the Terms of Service and the Developer Policies, but we're not entirely sure about a few things.
Our project does not focus on things such as monetary gain or predicting estimated income from a video or anything of that nature, or anything regarding trying to determine user data such as passwords/usernames, etc. It's much more about the content and statistics of the videos rather than anything else.
Our current ideas that we want to be sure would be ok to do and use:
Determine the category of a video given its title
Determine the category of a video given its tags
Determine the category of a video given its description
Determine the category of a video given its thumbnail
Some combination of above to create an ensemble model
Clustering on videos category/view counts
Sentiment analysis on comments
Trending topics over time
These are just a vague list for now but I would love to be able to reach out more to figure out what all we're allowed to use the data for.
so I want to ask if there's some sort of curation algorithm that arranges/sends results from a recommender system to a user.
For example, how Twitter recommends feeds to users. Is there some sort of algorithm that does that or Twitter just sorts it by highest number of interactions with that tweet (based on time posted too).
No, there is nothing like that.
Actually the recommendation system model is made in such a way, where it sort it based on Content Based filtering or Collaborative filtering according to the view stats of the user.
There are some algorithms like calculating co-relation between the view stats of the user and the content which is in twitter, and then recommend it.
Or Cosine Similarity and Cosine distance can also be used to calculate distance between view stats and content of twitter to recommend.
You must explore also other recommendation system, which is based on other algo's like Pearson Correlation, Weighted Average,etc.
I'm currently in the process of building a recommendation system with implicit data (e.g. clicks, views, purchases), however much of the research I've looked at seems to skip the step of "aggregating implicit data". For example, how do you aggregate multiple clicks and purchases overtime into a single user rating (as is required for a standard matrix factorization model)?
I've been experimenting with several Matrix Factorization based methods, including Neural Collaborative Filtering, Deep Factorization Machines, LightFM, and Variational Autoencoders for Collaborative Filtering. None of these papers seem to address the issue of aggregating implicit data. They also do not discuss how to weight different types of user events (e.g. clicks vs purchase) when calculating a score.
For now I've been using a confidence score approach (the conference score corresponds to the count of events) as outlined in this paper: http://yifanhu.net/PUB/cf.pdf. However this approach doesn't address incorporating other types of user events (other than clicks), nor does it address negative implicit feedback (e.g. a ton of impressions with zero clicks).
Anyway, I'd love some insight on this topic! Any thoughts at all would be hugely appreciated!
There's the method for building a recommendation system - Bayesian personalized ranking from implicit feedback. I also wrote an article on how it can be implemented using TensorFlow.
There's no "right" answer for the question of how to transfer implicit feedback explicitly. The answer will depend on business requirements. If the task is to increase the click rate, you should try to use the clicks. If the task of increasing conversion, you need to work with purchases.
I am developing a website, which will recommend recipes to the visitors based on their data. I am collecting data from their profile, website activity and facebook.
Currently I have data like [username/userId, rating of recipes, age, gender, type(veg/Non veg), cuisine(Italian/Chinese.. etc.)]. With respect to above features I want to recommend new recipes which they have not visited.
I have implemented ALS (alternating least squares) spark algorithm. In this we have to prepare csv which contains [userId,RecipesId,Rating] columns. Then we have to train this data and create the model by adjusting parameters like lamdas, Rank, iteration. This model generated recommendation, using pyspark
model.recommendProducts(userId, numberOfRecommendations)
The ALS algorithm accepts only three features userId, RecipesId, Rating. I am unable to include more features (like type, cuisine, gender etc.) apart from which I have mentioned above (userId, RecipesId, Rating). I want to include those features, then train the model and generate recommendations.
Is there any other algorithm in which I can include above parameters and generate recommendation.
Any help would be appreciated, Thanks.
Yes, there are couple of others algorithms. For your case, I would suggest that you Naive Bayes algorithm.
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
Since you are working on a web application, a JS solution, I guess, would come handy to you.
(simple) https://www.npmjs.com/package/bayes
or for example:
(a bit more powerful) https://www.npmjs.com/package/naivebayesclassifier
There are algorithms called recommender systems in machine learning. In this we have content based recommender systems. They are mainly used to recommend products/movies based on customer reviews. You can apply the same algorithm using customer reviews to recommend recipes. For better understanding of this algorithm refer this links:
https://www.youtube.com/watch?v=Bv6VkpvEeRw&list=PL0Smm0jPm9WcCsYvbhPCdizqNKps69W4Z&index=97
https://www.youtube.com/watch?v=2uxXPzm-7FY
You can go with powerful classification algorithms like
->SVM: works very well if you have more number of attributes.
->Logistic Regression: if you have huge data of customers.
You are looking for recommender systems using algorithms like collaborative filtering. I would suggest you to go through Prof.Andrew Ng's short videos on collaborative filtering algorithm and low-rank matrix factorization and also building recommender systems. They are a part of Coursera's Machine learning course offered by Stanford University.
The course link:
https://www.coursera.org/learn/machine-learning#%20
You can check week 9 for the content related to recommender systems.
I am currently working on a software which can connect users to jobs based on their user profiles. I ran text analytics on the job descriptions and derived important keywords from it. I have also collected user information from their profile. Matching the jobs to the user profiles seems to be a challenging task. Are there any Machine Learning based algorithms which can be used for match making?
OK, so basically, you have keywords for each job description and then you have some sort of text data (user profiles) to which you try to match those keywords.
Since your training data (user profiles) is not labeled, the supervised learning will not help you here. Unsupervised learning (clustering) could maybe help you in finding a certain patterns (keywords) from a loads of user profiles, but you would certainly need to experiment with different sorts of techniques (such as gaussian mixture models etc.) and observe possible patterns.
A simpler thing you could maybe do is to derive/find keywords also for each user profile(in other words to identify how many of your job keywords also exist in user profile) and then compare a distance between them using cosine similarity. You would then only need to determine the minimal angle threshold. This would be a parameter to play with. Of course you would need to vectorize your text data using bigrams or similar; if you use python there already is feature extraction in scikit). You could possibly also use tf-idf vectorizer on both, the job description and user profile but with some heavy and well determined words stop list.