I am working on a document which should contain the key differences between using Naive Bayes (generative) and Logistic Regression (discriminative) models for text classification.
During my research, I ran into this definition for Naive Bayes model: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
The probability of a document d being in class c is computed as ... where p(tk|c) is the conditional probability of term tk occurring in a document of class c...
When I got to the part of comparing Generative and Discriminative models, I found this explanation on StackOverflow as accepted: What is the difference between a Generative and Discriminative Algorithm?
A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as "the probability of y given x".
At this point I got confused: Naive Bayes is a generative model and uses conditional probabilities, but at the same time the discriminative models were described as if they learned the conditional probabilities as opposed to the joint probabilities of the generative models.
Can someone shed some light on this please?
Thank you!
It is generative in the sense that you don't directly model the posterior p(y|x) but rather you learn the model of the joint probability p(x,y) which can be also expressed as p(x|y) * p(y) (likelihood times prior) and then through the Bayes rule you seek to find the most probable y.
A good read I can recommend in this context is: "On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes"
(Ng & Jordan 2004)
Related
In TensorFlow library, what does the tf.estimator.LinearClassifier class do in linear regression models? (In other words, what is it used for?)
Linear Classifier is nothing but Logistic Regression.
According to Tensorflow documentation, tf.estimator.LinearClassifier is used to
Train a linear model to classify instances into one of multiple
possible classes. When number of possible classes is 2, this is binary
classification
Linear regression predicts a value while the linear classifier predicts a class. Classification aims at predicting the probability of each class given a set of inputs.
For implementation of tf.estimator.LinearClassifier, please follow this tutorial by guru99.
To know about the linear classifiers, read this article.
In the documentation of Scikit-Learn Decision Trees, it is stated that:
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
What is meant by non-parametric supervised learning?
non-parametric is on the opposite side of parametric. In a parametric learning model, you can describe the set of the hypothesis (or learning model) as a function of a finite number of parameters such as SVM.
Hence, a non-parametric model can be seen as a model with an infinite number of parameters to be described, i.e., the distribution of data cannot be defined by a finite set of parameters [1].
[2] An easy to understand nonparametric model is the k-nearest neighbors algorithm that makes predictions based on the k most similar training patterns for a new data instance. The method does not assume anything about the form of the mapping function other than patterns that are close are likely to have a similar output variable.
I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.
When we train a training set using decision tree classifier, we will get a tree model. And this model can be converted to rules and can be incorporated into a java code.
Now if I train the training set using Naive Bayes, in what form is the model? And how can I incorporated the model into my java code?
If there is no model resulted from the training, then what is the difference between Naive Bayes and lazy learner (ex. kNN)?
Thanks in advance.
Naive Bayes constructs estimations of conditional probabilities P(f_1,...,f_n|C_j), where f_i are features and C_j are classes, which, using bayes rule and estimation of priors (P(C_j)) and evidence (P(f_i)) can be translated into x=P(C_j|f_1,...,f_n), which can be roughly read as "Given features f_i I think, that their describe object of class C_j and my certainty is x". In fact, NB assumes that festures are independent, and so it actualy uses simple propabilities in form of x=P(f_i|C_j), so "given f_i I think that it is C_j with probability x".
So the form of the model is set of probabilities:
Conditional probabilities P(f_i|C_j) for each feature f_i and each class C_j
priors P(C_j) for each class
KNN on the other hand is something completely different. It actually is not a "learned model" in a strict sense, as you don't tune any parameters. It is rather a classification algorithm, which given training set and number k simply answers question "For given point x, what is the major class of k nearest points in the training set?".
The main difference is in the input data - Naive Bayes works on objects that are "observations", so you simply need some features which are present in classified object or absent. It does not matter if it is a color, object on the photo, word in the sentence or an abstract concept in the highly complex topological object. While KNN is a distance-based classifier which requires you to classify object which you can measure distance between. So in order to classify abstract objects you have to first come up with some metric, distance measure, which describes their similarity and the result will be highly dependent on those definitions. Naive Bayes on the other hand is a simple probabilistic model, which does not use the concept of distance at all. It treats all objects in the same way - they are there or they aren't, end of story (of course it can be generalised to the continuous variables with given density function, but it is not the point here).
The Naive Bayes will construct/estimate the probability distribution from which your training samples have been generated.
Now, given this probability distribution for all your output classes, you take a test sample, and depending on which class has the highest probability of generating this sample, you assign the test sample to that class.
In short, you take the test sample and run it through all the probability distributions (one for each class) and calculate the probability of generating this test sample for that particular distribution.
I am looking for a probabilistic classifier in OpenCV, which during prediction, returns the probabilities or membership values of the various classes.
I have looked into SVMs, ANNs and Normal Bayes Classifier(which is probabilistic), but all these classifiers return a discrete class for a given input.
For example if I have classes {A, B, C} and if I give an input {X} I need the membership values of the classes. Like A=0.2, B=0.1, C=0.7. Right now with these existing classifiers I am getting a discrete output(eg- C)
Cheers.
In this paper you will find heuristics for multi-class probabilistic outputs for SVM.