Predicting the item to sell, given a list of items - machine-learning

We have the data set which contains the mapping of customer to the product he buy like
c1->{P1, P2, p5}
c2->{P3, P5, p4}
c3->{P5, P2, p3}
....
on that basis we need to recommend a product for the customer,
let say for cx customer we need to recommend the product, since we have the data what cx is buying from the above set, and we run apriori to figure out the recommendation, but for big data set it's very slow ?
can someone please give us some suggestion by which we can crack that problem ?

I assume the items merchant is selling is your training data and then a random item is your testing data. So the most probable item to sell will depend upon the "features" of the items which merchant is selling currently. "Features" mean the price of the item, category, these are the details you will have. Then to decide the algorithm, I recommend you to have a look at the feature space. If there are small clusters, then even nearest-neighbor search would work better. If the distribution is complex then you can go for SVM. There are various data visualization techniques. Taking PCA and taking visualizing first two dimensions can be a good choice.

Related

Right Methode for ML Modell

I m making my first steps in AI and ML.
I choose myself a project, I want to fix with ML, but I m unsure which methode to use.
Business Case: A Customer can put offers and set a date he wants to receive his products.
He is able to change the amount of products he buys at every time.
I have to deal with the costs of unbuyed products and missing profit, in case I produced less than he wanted.
I have plenty of data from past transactions contianing the original amount of products ordered and the amount I sent to the costumer.
My goal is to get a predicitve analytics model which is able to tell me after a costumer changed the number of products from an order, how probably this change is final.
I m really new to this topic and are not quite getting all the information for the different methodes. I know classification and regression are the big players and can be implemented in different ways. But is one of those approaches fitting for my problem?
Many Thanks in advance.
You can go with a classification based approach. Since you goal is to predict whether the order change is final or not. The probability of happening that change can be derived from the accuracy/F1 score of your model. Higher the values, higher successful predictions. In laymen's terms think this as classifying whether the order is final or not.
You have to go for a regression approach if you're trying to predict a value based on the order change. For example if you want to predict what is the cost for the next order change, then you have to use regression.
As I understood your use case matches with the first scenario.

Incorporating prior knowledge to machine learning models

Say I have a data set of students with features such as income level, gender, parents' education levels, school, etc. And the target variable is say, passing or failing a national exam. We can train a machine learning model to predict, given these values whether a student is likely to pass or fail (say in sklearn, using predict_prob we can say the probability of passing)
Now say I have a different set of information which has nothing to do with the previous data set, which includes the schools and percentage of students from that particular school who has passed that national exam last year and years before. say, schoolA: 10%, schoolB: 15%, etc.
How can I use this additional knowledge to improve my model. For sure this data is valuable. (Students from certain schools have a higher chance of passing the exam due to their educational facilities, qualified staff, etc.).
Do i some how add this information as a new feature to the data set? If so what is the recommend way. Or do I use this information after the model prediction and somehow combine these to get a final probability ? Obviously an average or a weighted average doesn't work due to the second data set having probabilities in the range below 20% which then drags the total probability very low. How do data scientist usually incorporate this kind of prior knowledge? Thank you
You can try different ways to add this data and see if your model will be able to learn on this set. More likely you'll see right away, that this additional data will just confuse the model. Mostly because you're already providing more precise data on each student of the school and the model has more freedom to use this information.
But artificial neural network training is all about continuous trials and errors, so you definitely should try to train it with all possible data you can imagine to see if it'll be able to get a descent error in the end.
Use the average pass percentage of the students' school as a new feature of each student is worth to try.

Which classification to choose?

I have huge amount of yelp data and I have to classify the reviews into 8 different categories.
Categories
Cleanliness
Customer Service
Parking
Billing
Food Pricing
Food Quality
Waiting time
Unspecified
Reviews contains multiple categories so I have used multilable classification. But I am confuse how I can handle the positive/negative . Example review may be for positive for food quality but negative for customer service. Ex- food taste was very good but staff behaviour was very bad. so review contains positive food quality but negative Customer service How can I handle this case? Should I do sentiment analysis before classification? Please help me
I think your data is very similar to Restaurants reviews. It contains around 100 reviews, with varied number of aspect terms in each (More information). So you can use Aspect-Based Sentiment Analysis like this:
1-Aspect term Extraction
Extracting the aspect terms from the reviews.
2-Aspect Polarity Detection
For a given set of aspect terms within a sentence, determine whether the polarity of each aspect term is positive, negative.
3-Identify the aspect categories
Given a predefined set of aspect categories (e.g., food quality, Customer service), identify the aspect categories discussed in a given sentence.
4-Determine the polarity
Given a set of pre-identified aspect categories (e.g., food quality, Customer service), determine the polarity (positive, negative) of each aspect category.
Please see this for more information about similar project.
I hope this can help you.
Yes you would need a sentiment analysis. Why don't you create tokens of your data, that is find the required words out of the sentence, now the most possible approach for you is to find the related words along with their sentiment. i.e. food was good but the cleanliness was not appropriate
In this case you have [ food, good, cleanliness, not, appropriate ] now food links with its next term and cleanliness to its next terms "not appropriate"
again you can classify either into two classes i.e. 1,0 for good and bad .. or you can add classes based upon your case.
Then you would have data as such:
--------------------
FEATURE | VAL
--------------------
Cleanliness 0
Customer -1
Service -1
Parking -1
Billing -1
Food Pricing -1
Food Quality 1
Waiting time -1
Unspecified -1
I have given this just as an example where -1,1,0 are for no review, good and bad respectively. You can add more categories as 0,1,2 bad fair good
I may not be so good in answering this, but this is what i feel about it.
Note : You need to understand that you model cannot be perfect because that's what Machine Learning is all about, you have to be wrong. Your model cannot give a perfect classification it has to be wrong for certain inputs which it will learn with time and improve over.
There are many ways of doing multi label classification.
The simplest one would be having a model for each class, and if the review achieves a certain threshold score for that label, you would apply that label to the review.
This would treat the classes independently, but it seems like a good solution to your problem.

Identifying machine learning data to make predictions

As a learning exercise I plan to implement a machine learning algorithm (probably neural network) to predict what users earn trading stocks based on shares bought , sold and transaction times. Below datasets are test data I've formulated.
acronym's :
tab=millisecond time apple bought
asb=apple shares bought
tas=millisecond apple sold
ass=apple shares sold
tgb=millisecond time google bought
gsb=google shares bought
tgs=millisecond google sold
gss=google shares sold
training data :
username,tab,asb,tas,ass,tgb,gsb,tgs,gss
a,234234,212,456789,412,234894,42,459289,0
b,234634,24,426789,2,234274,3,458189,22
c,239234,12,156489,67,271274,782,459120,3
d,234334,32,346789,90,234254,2,454919,2
classifications :
a earned $45
b earned $60
c earned ?
d earned ?
Aim : predict earnings of users c & d based on training data
Is there any data points I should add to this data set? I should use alternative data perhaps ? As this is just a learning exercise of my own creation can add any feature that may be useful.
This data will need to be normalised, is there any other concept I should be aware of ?
Perhaps should not use time as a feature parameter as shares can bounce up and down depending on time.
You might want to solve your problem in below order:
Prediction for an individual stock's future value based on all stock's historical data.
Prediction for a combination of stocks' total future value based on a portfolio and all stocks' historical data.
A buy-sell short-term strategy for managing a portfolio. (when and what amount to buy/sell on which stock(s) )
If you can do 1) well for a particular stock, probably it's a good starting point for 2). 3) might be your goal but I put it in the last because it's even more complicated.
I would make some assumptions below and focus on how to solve 1) hopefully. :)
I assume at each timestamp, you have a vector of all possible features, e.g.:
stock price of company A (this is the target value)
stock price of other companies B, C, ..., Z (other companies might affect company A directly or indirectly)
52 week lowest price of A, B, C, ..., Z (long-term features begin)
52 week highest price of A, B, C, ..., Z
monthly highest/lowest price of A, B, C, ..., Z
weekly highest/lowest price of A, B, C, ..., Z (short-term features begin)
daily highest/lowest price of A, B, C, ..., Z
is revenue report day of A, B, C, ..., Z (really important features begin)
change of revenue of A, B, C, ..., Z
change of profit of of A, B, C, ..., Z
semantic score of company profile from social networks of A, ..., Z
... (imagination helps here)
And I assume you have almost all above features at every fixed time interval.
I think a lstm-like neural network is very relevant here.
Don't use the username along with the training data - the network might make associations between the username and the $ earned. Including it would factor in the user to the output decision, while excluding it ensures the network will be able to predict the $ earned for an arbitrary user.
Using parameter that you are suggesting seems me impossible to predict earnings.
The main reason is that input parameters don't correlate with output value.
You input values contradicts itself - consider such case is it possible that for the same input you will expect different output values? If so you won't be able predict any output for such input.
Let's go further, earnings of trader depend not only from a share of bought/sold stocks, but also from price of each one of them. This will bring us to the problem when we provide to neural network two equals input but desire different outputs.
How to define 'good' parameters to predict desired output in such case?
I suggest first of all to look for people who do such estimations then try to define a list of parameters they take into account.
If you will succeed you will get a huge list of variables.
Then you can try to build some model for example, using neural network.
Besides normalisation you'll also need scaling. Another question, which I have for you is classification of stocks. In your example you provide google and apple which are considered as blue-chipped stocks. I want to clarify, you want to make prediction of earning only for google and apple or prediction for any combination of two stocks?
If you want to make prediction only for google and apple and provide data which you have, then you can apply only normalization and scaling with some kind of recurrent neural network. Recurrent NN are better in prediction tasks then simple model of feedforward with backpropagation training.
But in case if you want to apply your training algorithm to more then just google and apple, I recommend you to split your training data into some groups by some criteria. One example of dividing can be according to capitalization of stocks. And if you want to make capitalization dividing, you can make five groups ( as example ). And if you decide to make five groups of stocks, you can also apply equilateral encoding in order to decrease number of dimensions for NN learning.
Another kind of grouping which you can think of can be area of operation of stock. For example agricultural, technological, medical, hi-end, tourist groups.
Let's say you decided to give this grouping as mentioned ( I mean agricultural, technological, medical, hi-end, tourist). Then five groups will give you five entries into NN to input layer ( so called thermometer encoding ).
And let's say you want to feed agricultural stock.
Then input will look like this:
1,0,0,0,0, x1, x2, ...., xn
Where x1, x2, ...., xn - are other entries.
Or if you apply equilateral encoding, then you'll have one dimension less ( I'm to lazy to describe how it will look like ).
Yet one more idea for converting entries for neural network can be thermometer encoding.
And one more idea to keep in your mind, as usually people loose on trading stocks, so your data set will be biased. I mean if you randomly choose only 10 traders, they all can be losers, and your data set will not be completely representative. So in order to avoid data bias, you should have big enough data set of traders.
And one more detail, you don't need to pass into NN user id, because NN then learn trading style of particular user, and use it for prediction.
Seems to me dimensions are more than data points. However, it might be the case that your observations are in a linear sub space, you just need to compute the kernel of the matrix shown above.
If the kernel has a larger dimension than the number of data points then you do not need add more data points.
Now there is another thing to look at. You should check out your classifier's VC dimension, don't want to add too many points to the dataset. But anyway that is mostly theoretical in this example, and I'm just joking.

Using decision tree in Recommender Systems

I have a decision tree that is trained on the columns (Age, Sex, Time, Day, Views,Clicks) which gets classified into two classes - Yes or No - which represents buying decision for an item X.
Using these values,
I'm trying to predict the probability of 1000 samples(customers) which look like ('12','Male','9:30','Monday','10','3'),
('50','Female','10:40','Sunday','50','6')
........
I want to get the individual probability or a score which will help me recognize which customers are most likely to buy the item X. So i want to be able to sort the predictions and show a particular item to only 5 customers who will want to buy the item X.
How can I achieve this ?
Will a decision tree serve the purpose?
Is there any other method?
I'm new to ML so please forgive me for any vocabulary errors.
Using decision tree with a small sample set, you will definitely run into overfitting problem. Specially at the lower levels of the decision, where tree you will have exponentially less data to train your decision boundaries. Your data set should have a lot more samples than the number of categories, and enough samples for each categories.
Speaking of decision boundaries, make sure you understand how you are handling data type for each dimension. For example, 'sex' is a categorical data, where 'age', 'time of day', etc. are real valued inputs (discrete/continuous). So, different part of your tree will need different formulation. Otherwise, your model might end up handling 9:30, 9:31, 9:32... as separate classes.
Try some other algorithms, starting with simple ones like k-nearest neighbour (KNN). Have a validation set to test each algorithm. Use Matlab (or similar software) where you can use libraries to quickly try different methods and see which one works best. There is not enough information here to recommend you something very specific. Plus,
I suggest you try KNN too. KNN is able to capture affinity in data. Say, a product X is bought by people around age 20, during evenings, after about 5 clicks on the product page. KNN will be able to tell you how close each new customer is to the customers who bought the item. Based on this you can just pick the top 5. Very easy to implement and works great as a benchmark for more complex methods.
(Assuming views and clicks means the number of clicks and views by each customer for product X)
A decision tree is a classifier, and in general it is not suitable as a basis for a recommender system. But, given that you are only predicting the likelihood of buying one item, not tens of thousands, it kind of makes sense to use a classifier.
You simply score all of your customers and retain the 5 whose probability of buying X is highest, yes. Is there any more to the question?

Resources