How to model personalized threshold problem with macine learning - machine-learning

Assume that I have a candidate selection system to generate product/user pairs for recommendation. Currently, in order to hold a quality bar for the recommended product, we trained a model to optimize for the click of the link, denoting as pClick(product, user) model, the output of the model is a score of (0,1) representing how likely the user will click on the recommended product.
For our initial launch product, we set a manually selected threshold, say T for all users. For all users, only when the threshold pass T, we will send user the recommendation.
Now we realize this is not optimal: Some users care less about recommendation quality while some other users have a high bar of recommendation quality. And a personalized threshold, instead of the global T can help us improve the overall relevance.
The goal is to output the threshold for each user, assume we have training data for each user's activity and user/product attributes.
The question is: How should we model this problem with machine learning? Any reference or papers is highly appreciated.

Related

Right Methode for ML Modell

I m making my first steps in AI and ML.
I choose myself a project, I want to fix with ML, but I m unsure which methode to use.
Business Case: A Customer can put offers and set a date he wants to receive his products.
He is able to change the amount of products he buys at every time.
I have to deal with the costs of unbuyed products and missing profit, in case I produced less than he wanted.
I have plenty of data from past transactions contianing the original amount of products ordered and the amount I sent to the costumer.
My goal is to get a predicitve analytics model which is able to tell me after a costumer changed the number of products from an order, how probably this change is final.
I m really new to this topic and are not quite getting all the information for the different methodes. I know classification and regression are the big players and can be implemented in different ways. But is one of those approaches fitting for my problem?
Many Thanks in advance.
You can go with a classification based approach. Since you goal is to predict whether the order change is final or not. The probability of happening that change can be derived from the accuracy/F1 score of your model. Higher the values, higher successful predictions. In laymen's terms think this as classifying whether the order is final or not.
You have to go for a regression approach if you're trying to predict a value based on the order change. For example if you want to predict what is the cost for the next order change, then you have to use regression.
As I understood your use case matches with the first scenario.

Hybrid recommendation system with matrix factorization and linear regression

I'm following a tutorial that for creating a recommendation system in BigQueryML. The tutorial uses matrix factorization first to calculate user and item factors. In the end I have a model that can be queried with user ids or item ids to get recommendations.
The next step is feeding the factors and additional item + user features into a linear regression model to incorporate more context.
"Essentially, we have a couple of attributes about the movie, the
product factors array corresponding to the movie, a couple of
attributes about the user, and the user factors array corresponding to
the user. These form the inputs to our “hybrid” recommendations model
that builds off the matrix factorization model and adds in metadata
about users and movies."
I just don't understand why the dataset for linear regression excludes the user and item ids:
SELECT
p.* EXCEPT(movieId),
u.* EXCEPT(userId),
rating
FROM productFeatures p, userFeatures u
JOIN movielens.ratings r
ON r.movieId = p.movieId AND r.userId = u.userId
My question is:
How will I be able to get recommendations for a user from the linear model, when I don't have the user or item ids in the model?
Here you can find the full code:
https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/recommendation_systems/solutions/als_bqml_hybrid.ipynb
In the example you have shared, the goal is to fit a linear regression to the discovered factor values so that a novel set of factor values can be used to predict the rating. In this kind of setup, you don't want information about which samples are being used; the only crucial information is the training features (the factor scores) and the rating (the training/test label). For more on this topic, take a look at "Dimensionality reduction using non-negative matrix factorization for information retrieval."
If you included the movie ids and user ids in as features, your regression would try to learn on those, which would either add noise to the model or learn that low ids = lower score etc. This is possible, especially if this ids are in some kind of order you're not aware of, such as chronological or by genre.
Note: You could use movie-specific or user-specific information to build a model, but you would have many, many dimensions of data, and that tends to create poorly performing models. The idea here is to avoid the problem of dimensionality by first reducing the dimensionality of the problem space. Matrix factorization is just one method among many to do this. See, for example, PCA, LDA, and word2vec.

Incorporating prior knowledge to machine learning models

Say I have a data set of students with features such as income level, gender, parents' education levels, school, etc. And the target variable is say, passing or failing a national exam. We can train a machine learning model to predict, given these values whether a student is likely to pass or fail (say in sklearn, using predict_prob we can say the probability of passing)
Now say I have a different set of information which has nothing to do with the previous data set, which includes the schools and percentage of students from that particular school who has passed that national exam last year and years before. say, schoolA: 10%, schoolB: 15%, etc.
How can I use this additional knowledge to improve my model. For sure this data is valuable. (Students from certain schools have a higher chance of passing the exam due to their educational facilities, qualified staff, etc.).
Do i some how add this information as a new feature to the data set? If so what is the recommend way. Or do I use this information after the model prediction and somehow combine these to get a final probability ? Obviously an average or a weighted average doesn't work due to the second data set having probabilities in the range below 20% which then drags the total probability very low. How do data scientist usually incorporate this kind of prior knowledge? Thank you
You can try different ways to add this data and see if your model will be able to learn on this set. More likely you'll see right away, that this additional data will just confuse the model. Mostly because you're already providing more precise data on each student of the school and the model has more freedom to use this information.
But artificial neural network training is all about continuous trials and errors, so you definitely should try to train it with all possible data you can imagine to see if it'll be able to get a descent error in the end.
Use the average pass percentage of the students' school as a new feature of each student is worth to try.

In a content-based recommender systems, how to judge per-user rather than per-rating?

I'm studying the recommender systems from the Andrew Ng course on Coursera, and this question popped into my mind.
In the course, Andrew does recommendations for movies, like Netflix does.
We have an output matrix Y of ratings of various movies, where each cell Y(i,j) is the rating given by user j to movie i. If the user has not rated it, Y(i,j)=?
Assuming we are doing linear regression, we had the following minimization objective:
My question is, doesn't this calculate on a per-rating basis? As in, all ratings are equal. So if someone rates 100 movies, he has more effect on the algorithm than someone who rates only 10 movies.
I was wondering if it is possible to judge on a per-user basis, i.e. all users are equal.
It is definitely possible to apply a weight to the loss function with either weight = 1/ratings_for_user[u] or weight = 1/sqrt(ratings_for_user[u]). where ratings_per_user[u] is the number of rating for the user who gave the rating in your particular sample. Whether it's a good idea or not is another question.
To answer that question, I would first ask the question: Is this more meaningful to the problem you are really trying to solve? If it is, as the second question: Does the model you built work well? Does it have a good cross validation score?

Using decision tree in Recommender Systems

I have a decision tree that is trained on the columns (Age, Sex, Time, Day, Views,Clicks) which gets classified into two classes - Yes or No - which represents buying decision for an item X.
Using these values,
I'm trying to predict the probability of 1000 samples(customers) which look like ('12','Male','9:30','Monday','10','3'),
('50','Female','10:40','Sunday','50','6')
........
I want to get the individual probability or a score which will help me recognize which customers are most likely to buy the item X. So i want to be able to sort the predictions and show a particular item to only 5 customers who will want to buy the item X.
How can I achieve this ?
Will a decision tree serve the purpose?
Is there any other method?
I'm new to ML so please forgive me for any vocabulary errors.
Using decision tree with a small sample set, you will definitely run into overfitting problem. Specially at the lower levels of the decision, where tree you will have exponentially less data to train your decision boundaries. Your data set should have a lot more samples than the number of categories, and enough samples for each categories.
Speaking of decision boundaries, make sure you understand how you are handling data type for each dimension. For example, 'sex' is a categorical data, where 'age', 'time of day', etc. are real valued inputs (discrete/continuous). So, different part of your tree will need different formulation. Otherwise, your model might end up handling 9:30, 9:31, 9:32... as separate classes.
Try some other algorithms, starting with simple ones like k-nearest neighbour (KNN). Have a validation set to test each algorithm. Use Matlab (or similar software) where you can use libraries to quickly try different methods and see which one works best. There is not enough information here to recommend you something very specific. Plus,
I suggest you try KNN too. KNN is able to capture affinity in data. Say, a product X is bought by people around age 20, during evenings, after about 5 clicks on the product page. KNN will be able to tell you how close each new customer is to the customers who bought the item. Based on this you can just pick the top 5. Very easy to implement and works great as a benchmark for more complex methods.
(Assuming views and clicks means the number of clicks and views by each customer for product X)
A decision tree is a classifier, and in general it is not suitable as a basis for a recommender system. But, given that you are only predicting the likelihood of buying one item, not tens of thousands, it kind of makes sense to use a classifier.
You simply score all of your customers and retain the 5 whose probability of buying X is highest, yes. Is there any more to the question?

Resources