I'm starting to study Machine Learning now and I saw in some articles the ROC Curve being used only in binary classification.
Can I use the ROC Curve in MultiClass classification and measure my AUC ?
ROC curves are typically used in binary classification also it can be used for MultiClass classification , but you have to binarize the output.
You can read here the example for multi class example:
visit https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
Related
In a classification problem, I cannot use a simple logit model if my data label (aka., dependent variable) has more than two categories. That leaves me with multinomial regression and Linear Discriminant Analysis (LDA) and the likes. Why is it that multinomial logit is not as popular as LDA in machine learning? What is the particular advantage that LDA offers?
I am a beginner in machine learning field and I want to learn how to do multiclass classification with Gradient Boosting Tree (GBT). I have read some of the articles about GBT but for regression problem and I couldn't find the right explanation about GBT for multiclass classfication. I also check GBT in scikit-learn library for machine learning. The implementation of GBT is GradientBoostingClassifier which used regression tree as the weak learners for multiclass classification.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Source : http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier
The things is, why do we use regression tree as our learners for GBT instead of classification tree ? It would be very helpful, if someone can provide me the explanation about why regression tree is being used rather than classification tree and how regression tree can do the classification. Thank you
You are interpreting 'regression' too literally here (as numeric prediction), which is not the case; remember, classification is handled with logistic regression. See, for example, the entry for loss in the documentation page you have linked:
loss : {‘deviance’, ‘exponential’}, optional (default=’deviance’)
loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs. For loss ‘exponential’ gradient boosting recovers the AdaBoost algorithm.
So, a 'classification tree' is just a regression tree with loss='deviance'...
Naive Bayes Algorithm assumes independence among features. What are some text classification algorithms which are not Naive i.e. do not assume independence among it's features.
The answer will be very straight forward, since nearly every classifier (besides Naive Bayes) is not naive. Features independence is very rare assumption, and is not taken by (among huge list of others):
logistic regression (in NLP community known as maximum entropy model)
linear discriminant analysis (fischer linear discriminant)
kNN
support vector machines
decision trees / random forests
neural nets
...
You are asking about text classification, but there is nothing really special about text, and you can use any existing classifier for such data.
Could somebody give me the example to show how platt scaling is used along with k-fold cross-validation in multiclass SVM classification in libsvm?
I have divided the whole dataset in two parts: Training and testing. For cross-validation i am partitioning the training data such that 1 partition is for testing and rest is for training multiclass SVM classifier.
Platt scaling has nothing to do with your partitioning or multiclass setting. Platt scaling is internal technique of each individual binary SVM, which uses only a training data. This is actually just fitting a logistic reggresion on top of your learned SVM projections.
What is the exact difference between a Support Vector Machine classifier and a Support Vector Machine regresssion machine?
The one sentence answer is that SVM classifier performs binary classification and SVM regression performs regression.
While performing very different tasks, they are both characterized by following points.
usage of kernels
absence of local minima
sparseness of the solution
capacity control obtained by acting on the margin
number of support vectors, etc.
For SVM classification the hinge loss is used, for SVM regression the epsilon insensitive loss function is used.
SVM classification is more widely used and in my opinion better understood than SVM regression.