Advice on classification approach

Advice on classification approach - machine-learning

I need to classify incoming car rentals, but my historic data that I could use for training is in "grouped" form and I can't see how I could train a classification model.
My incoming data is a list of car model, quantity and unit price:
Chevrolet Spark, 1, 196.91
Fiat 500, 1, 196.91
Toyota Prius Hybrid, 3, 213.73
This incoming data is currently manually classified and saved grouped by class and total price per group (Chevy and Fiat is Economy, Prius is Hybrid):
Economy, 393.82
Hybrid, 641.19
This problem should be solvable by machine learning but I can't figure out how to build a training set for a supervised classifier. Any guidance appreciated.
Thanks

A naive bayes classifier should do what you are trying to do... You can use the price as the feats to use and learn from what is already tagged.
However i don't get how you can have consistent data using the TOTAL price to classify since you don't always have as many objects from one group to another... You would have to use the unit price.

There are lots of algorithms that provide multiclass classification, but could you explain more about what you're trying to predict? From what you've written, it sounds more like a scenario for an ETL process than a machine learning model.
If I understand your example correctly, an incoming record with a car model of "Chevy Spark" or "Fiat 500" would always be labeled "Economy", while an incoming record with a car model of "Toyota Prius Hybrid" would be labeled "Hybrid". A simple lookup table would do the job here - no need for fancy machine learning mathematics. :)

Related

Which prediction model should we use to predict the list of colleges for a student

I have a training data set containing College names,student rank, branch, college cutoff. Which prediction model should I use to predict the list of colleges a student will get admission in according to his rank, college cutoff and the branch?
I am new to machine learning.
I expect the output to display a list of colleges in which a student can be admitted instead of displaying if a college is allocated to a student.

Your problem can be treated as a multi class classification problem where every college will become a class. You can use a simple random forest model and predict the class probabilities for every student record. Since you are using probabilities, the model will return the list of college along with the probability. Set a probability threshold and take the college above that threshold as your result.

This is a multiclass classification problem. If you are new, I suggest use tree based models such as random forest classifier (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) or try Xgboost if you are not getting good enough results from random forest. They are easy to use and perform nicely in multi-class classification problems. They will also give you feature importance easily that will help you to explain your model as well.

What classifier should I use to find out expected subsequent purchased category for each user based on the month he is buying?

Which classifier in Machine Learning should I use to predict the expected subsequent purchased category based on the month he is purchasing?
Given a dataset consisting of columns
uuid date price product_id category
I am guessing the Naive Bayes algorithm, any better suggestions?

In ML, you cann't rely on single model, try every model which you can try out and pick which gives best result. I generally start with simple model such as Ridge, Logistic regression, Elastic net, SVM etc. which still work very good. then i look into tree based and gradient boosting models.
If you have product/users features then use can try KNN model on those features, which find the k-similar neighbours.
Also, if you would like to look in the recommendation system, this may be helpful. In product recommendation, these models works pretty good.

Incorporating prior knowledge to machine learning models

Say I have a data set of students with features such as income level, gender, parents' education levels, school, etc. And the target variable is say, passing or failing a national exam. We can train a machine learning model to predict, given these values whether a student is likely to pass or fail (say in sklearn, using predict_prob we can say the probability of passing)
Now say I have a different set of information which has nothing to do with the previous data set, which includes the schools and percentage of students from that particular school who has passed that national exam last year and years before. say, schoolA: 10%, schoolB: 15%, etc.
How can I use this additional knowledge to improve my model. For sure this data is valuable. (Students from certain schools have a higher chance of passing the exam due to their educational facilities, qualified staff, etc.).
Do i some how add this information as a new feature to the data set? If so what is the recommend way. Or do I use this information after the model prediction and somehow combine these to get a final probability ? Obviously an average or a weighted average doesn't work due to the second data set having probabilities in the range below 20% which then drags the total probability very low. How do data scientist usually incorporate this kind of prior knowledge? Thank you

You can try different ways to add this data and see if your model will be able to learn on this set. More likely you'll see right away, that this additional data will just confuse the model. Mostly because you're already providing more precise data on each student of the school and the model has more freedom to use this information.
But artificial neural network training is all about continuous trials and errors, so you definitely should try to train it with all possible data you can imagine to see if it'll be able to get a descent error in the end.

Use the average pass percentage of the students' school as a new feature of each student is worth to try.

Model that predict both categorical and numerical output

I am building a RNN for a time series model, which have a categorical output.
For example, if precious 3 pattern is "A","B","A","B" model predict next is "A".
there's also a numerical level associated with each category.
For example A is 100, B is 50,
so A(100), B(50), A(100), B(50),
I have the model framework to predict next is "A", it would be nice to predict the (100) at the same time.
For real life examples, you have national weather data.
You are predicting the next few days weather type(Sunny, windy, raining ect...) at the same time, it would be nice model will also predict the temperature.
Or for Amazon, analysis customer's trxns pattern.
Customer A shopped category
electronic($100), household($10), ... ...
predict what next trxn category that this customer is likely to shop and predict at the same time what would be the amount of that trxns.
Researched a bit, have not found any relevant research on similar topics.

What is stopping you from adding an extra output to your model? You could have one categorical output and one numerical output next to each other. Every neural network library out there supports multiple outputs.
Your will need to normalise your output data though. Categories should be normalised with one-hot encoding and numerical values should be normalised by dividing by some maximal value.
Researched a bit, have not found any relevant research on similar topics.
Because this is not really a 'topic'. This is something completely normal, and it does not require some special kind of network.

Is it unsupervised learning if I don't have labeled data to train my models on

I would like to collect user information to determine whether they are male or female. I have zero labeled data for my users, but I know some features that can easily predict their gender. An example would be texts created by the users that contain words strongly associated with one gender (ex: Male: beer, football game, boxers. Female: facial, makeup, bra).
Would this be considered unsupervised learning, since I don't have labelled data to train my models on?

This is neither supervised nor unsupervised. You are just applying some predefined rules to classify between male/fame.
This is also not machine learning, because you don't use any learning method...

A supervised learning method would use all of the text used by the users and allow the machine to determine which words are important and by how much by trying to guess the user's gender and then correcting itself with the label.
An unsupervised method would be to provide the machine with all the text by the users and allow it to try and create different pattern groups out of it. However, there are many more ways to group users than 'male' and 'female' so this is not exactly an ideal unsupervised problem.
Telling the system which words are important and separating the system into groups based on that would just be a regular program and can be accomplished by any programming language that can match text and provide an output.
pro·gram
noun
1.
a planned series of future events, items, or performances.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Advice on classification approach - machine-learning

Related

Which prediction model should we use to predict the list of colleges for a student

What classifier should I use to find out expected subsequent purchased category for each user based on the month he is buying?

Incorporating prior knowledge to machine learning models

Model that predict both categorical and numerical output

Is it unsupervised learning if I don't have labeled data to train my models on

Categories

Resources