Use Generative model or discriminative model? - machine-learning

I spend much time figuring out the differences between generative model and discriminative model, it seems that generative model is more useful because it is capable of more than prediction, but it is said that discriminative model often outperform generative model. So, in which cases we should apply generative model and when turn to discriminative model.

A generative model is able to sample from the unterlying distribution. It learns P(x, y).
A discriminative model is able to distinguish classes. It learns P(y|x).
Often, it is not necessary to be able to sample from the distribution. You only need to be able to discriminate. And discriminators are usually simpler to train.

Related

How to handle imbalanced data in general

I have been working on the case study where data is highly imbalanced. we have been taught we can handle the imbalanced data by either under sampling the majority class or over sampling the minority class.
I wanted to ask if there is any other way/method that can be used to handle imbalanced data?
this question is more on conceptual side than programming.
for example,
I was thinking if we could put some weight on minority class (conceptually) to make the model emphasize on identifying pattern in minority class.
I don't know how that can be done but this concept theoretically should work.
feel free to put crazy ideas too.
Your weights idea is not far off. This can be done. In fact, most sklearn Models give you the option to specify class weights. This however is often not enough for very extreme cases (e.g. 95%/5% split or more extreme).
There are specific oversampling techniques such as SMOTE (and related techniques) which go a step further than classic oversampling and generate synthetic samples based on a K Nearest Neighbors Algorithm.
If the classes are extremely imbalanced a "classic" classification approach may not be enough and you might have to look into anomaly detection algorithms.
Just use strong consistent classifier. See https://arxiv.org/abs/2201.08528
Generally speaking I think that you need, before diving into the technical solution (under/upsampling, smote ..) consider the business KPI you are predicting and whether there is a proxy that could help reduce the disparity rate between the classes.
You can think also about the models that have weights parameters and could penalize the majority class
You can check this article, it explains from a conceptual point of view how to deal with imbalanced data in general.

How to evaluate machine learning model performance on brand new datasets, in addition to train, validation, and test datasets?

The Scenario:
Our data science team builds machine learning models for classification tasks. We evaluate our model performance on train, validation and test datasets. We use precision, recall and F1 score.
We then run the models on brand-new datasets in the production environment and make predictions. One week later, we get feedback on how well our predictive models have performed.
The question:
When we evaluate the performance of our models on the real datasets, what metrics should we use? Is prediction accuracy a better metric in this context?
I think you should either measure the same metrics, or some business metrics.
Usually the models are optimized for a certain loss/metric and this means that model having a high value of a certain metric can have a worse value on a different metric.
Accuracy is a metric which is heavily influenced by balance of classes in the data, so it should be used with care.
So I suggest to use the same metrics.
Another approach is using some business metrics - for example the revenue, which these models brought.
Model evaluation
Take a look at this paper. It is fairly easy to follow and covers everything you need to know about machine learning model validation.

Why do we use metric learning when we can classify

So far, I have read some highly cited metric learning papers. The general idea of such papers is to learn a mapping such that mapped data points with same label lie close to each other and far from samples of other classes. To evaluate such techniques they report the accuracy of the KNN classifier on the generated embedding. So my question is if we have a labelled dataset and we are interested in increasing the accuracy of classification task, why do not we learn a classifier on the original datapoints. I mean instead of finding a new embedding which suites KNN classifier, we can learn a classifier that fits the (not embedded) datapoints. Based on what I have read so far the classification accuracy of such classifiers is much better than metric learning approaches. Is there a study that shows metric learning+KNN performs better than fitting a (good) classifier at least on some datasets?
Metric learning models CAN BE classifiers. So I will answer the question that why do we need metric learning for classification.
Let me give you an example. When you have a dataset of millions of classes and some classes have only limited examples, let's say less than 5. If you use classifiers such as SVMs or normal CNNs, you will find it impossible to train because those classifiers (discriminative models) will totally ignore the classes of few examples.
But for the metric learning models, it is not a problem since they are based on generative models.
By the way, the large number of classes is a challenge for discriminative models itself.
The real-life challenge inspires us to explore more better models.
As #Tengerye mentioned, you can use models trained using metric learning for classification. KNN is the simplest approach but you can take the embeddings of your data and train another classifier, be it KNN, SVM, Neural Network, etc. The use of metric learning, in this case, would be to change the original input space to another one which would be easier for a classifier to handle.
Apart from discriminative models being hard to train when data is unbalanced, or even worse, have very few examples per class, they cannot be easily extended for new classes.
Take for example facial recognition, if facial recognition models are trained as classification models, these models would only work for the faces it has seen and wouldn't work for any new face. Of course, you could add images for the faces you wish to add and retrain the model or fine-tune the model if possible, but this is highly impractical. On the other hand, facial recognition models trained using metric learning can generate embeddings for new faces, which can be easily added to the KNN and your system then can identify the new person given his/her image.

Collecting Machine learning training data

I am very new to machine learning, and need a couple of things clarified. I am trying to predict the probability of someone liking an activity based on their Facebook likes. I am using the Naive Bayes classifier, but am unsure on a couple of things. 1. What would my labels/inputs be? 2. What info do I need to collect for training data? My guess is create a survey and have questions on wether the person would enjoy an activity (Scale from 1-10)
In supervised classification, all classifiers need to be trained with known labeled data, this data is known as training data. Your data should have a vector of features followed by a special one called class. In your problem, if the person has enjoyed the activity or not.
Once you train the classifier, you should test it's behavior with another dataset in order not to be biased. This dataset must have the class as the train data. If you train and test with the same datasets your classifiers prediction may be really nice but unfair.
I suggest you to take a look to evaluation techniques like K Fold Cross Validation.
Another thing you should know is that the common Naïve Bayes classifier is used to predict binary data, so your class should be 0 or 1 meaning that the person you make a survey enjoyed or not the activity. Also it's implemented in packages like Weka (Java) or SkLearn (Python).
If you are really interested in Bayesian Classifiers I need to say that in fact, Naïve Bayes for binary classification is not the best one because Minsky in 1961 discovered that the decision boundaries are hyperplanes. Also the Brier Score is really bad and it is say that this classifier is not well calibrated. But, it make good predictions after all.
Hope it helps.
This may be fairly difficult with Naive Bayes. You'll need to collect (or calculate) samples of whether or not a person likes activity X, and also details on their Facebook likes (organized in some consistent way).
Basically, for Naive Bayes, your training data should be the same data type as your testing data.
The survey approach may work, if you have access to each person's Facebook like history.

Multi-Class Classification in WEKA

I am trying to implement Multiclass classification in WEKA.
I have lot of rows, say bank transactions, and one is tagged as Food,Medicine,Rent,etc. I want to develop a classifier which can be trained with the previous data I have and predict the class it can belong to for future transactions. If I am right this is Multiclass and not multilabel since each transaction can belong to only one class.
Below are a few algorithms I am considering
Naive Bayes
Multinomial Logistic Regression
Multiclass SVM
Max Entropy
Neural Networks (if possible)
In my data Number of features <<< Number of transactions and hence I am thinking of one vs rest binary classifier instead of one vs one.
Are there any other algorithms I should lok into which will help with my goal?
Is there any algos that I put are useless for my goal?
Also,I found that scikit-learn in Python is better than WEKA but I can run scikit-learn only on one processor. Is this true?
Answers to any question would be helpful.
Thanks!
You can look at RandomForest which is a well known classifier and quite efficient.
In scikit-learn, you have some class that can be used over several core like RandomForestClassifier. It has a constructor parameter that can be used to define the number of core or a value that will use every available core. Look at the documentation, constructor that contains n_jobs parameter can be used over several core

Resources