Vowpal Wabbit Multiclass Linear Classification - machine-learning

Is it possible to train a multiclass (multinomial) linear classification model with Vowpal Wabbit Library?
I tried to use --oaa with --loss_function squared, but it seems that the default loss function for --oaa is logistic.
I am using rcv1.multiclass as input.
One Solution:
I can create multiple version of the data as follows:
Version i: make all the labels zero except class I
Then I can train multiple binary classifications for each version of data. Finally, I can feed the test data to all the classifier and apply an argmax. Is there any better (automated) solution?

When you use vw --oaa N, you will actually get a linear N-class classifier. To get a non-linear classifier you would need to add quadratic/polynomial features (-q, --cubic, --interactions) or kernels (--ksvm) or a hidden layer (--nn) or any other nonlinear reduction (--lrq, --stage_poly, --autolink).
The choice of loss function does not effect whether the classifier is linear or not. The default is --loss_function=squared. For classification, I would suggest to use --loss_function=logistic (possibly with --probabilities if you want to predict probability of each class) or --loss_function=hinge (if you care only about the top class).
Then I can train multiple binary classifications for each version of data. Finally, I can feed the test data to all the classifier and apply an argmax. Is there any better (automated) solution?
Yes, this is exactly what --oaa does (but more efficiently).

Related

What Does tf.estimator.LinearClassifier() Do?

In TensorFlow library, what does the tf.estimator.LinearClassifier class do in linear regression models? (In other words, what is it used for?)
Linear Classifier is nothing but Logistic Regression.
According to Tensorflow documentation, tf.estimator.LinearClassifier is used to
Train a linear model to classify instances into one of multiple
possible classes. When number of possible classes is 2, this is binary
classification
Linear regression predicts a value while the linear classifier predicts a class. Classification aims at predicting the probability of each class given a set of inputs.
For implementation of tf.estimator.LinearClassifier, please follow this tutorial by guru99.
To know about the linear classifiers, read this article.

The best loss function for pixelwise binary classification in keras

I built a deep learning model which accept image of size 250*250*3 and output 62500(250*250) binary vector which contains 0s in pixels that represent the background and 1s in pixels which represents ROI.
My model is based on DenseNet121 but when i use softmax as an activation function in last layer and categorical cross entropy loss function , the loss is nan.
What is the best loss and activation function that i can use it in my model?
What is the difference between binary cross entropy and categorical cross entropy loss function?
Thanks in advance.
What is the best loss and activation function that i can use it in my model?
Use binary_crossentropy because every output is independent, not mutually exclusive and can take values 0 or 1, use sigmoid in the last layer.
Check this interesting question/answer
What is the difference between binary cross entropy and categorical cross entropy loss function?
Here is a good set of answers to that question.
Edit 1: My bad, use binary_crossentropy.
After a quick look at the code (again) I can see that keras uses:
for binary_crossentropy -> tf.nn.sigmoid_cross_entropy_with_logits
(From tf docs): Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
for categorical_crossentropy -> tf.nn.softmax_cross_entropy_with_logits
(From tf docs): Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

Train multi-class classifier for binary classification

If a dataset contains multi categories, e.g. 0-class, 1-class and 2-class. Now the goal is to divide new samples into 0-class or non-0-class.
One can
combine 1,2-class into a unified non-0-class and train a binary classifier,
or train a multi-class classifier to do binary classification.
How is the performance of these two approaches?
I think more categories will bring about a more accurate discriminant surface, however the weights of 1- and 2- classes are both lower than non-0-class, resulting in less samples be judged as non-0-class.
Short answer: You would have to try both and see.
Why?: It would really depend on your data and the algorithm you use (just like for many other machine learning questions..)
For many classification algorithms (e.g. SVM, Logistic Regression), even if you want to do a multi-class classification, you would have to perform a one-vs-all classification, which means you would have to treat class 1 and class 2 as the same class. Therefore, there is no point running a multi-class scenario if you just need to separate out the 0.
For algorithms such as Neural Networks, where having multiple output classes is more natural, I think training a multi-class classifier might be more beneficial if your classes 0, 1 and 2 are very distinct. However, this means you would have to choose a more complex algorithm to fit all three. But the fit would possibly be nicer. Therefore, as already mentioned, you would really have to try both approaches and use a good metric to evaluate the performance (e.g. confusion matrices, F-score, etc..)
I hope this is somewhat helpful.

Logistic Regression only recognizing predominant classes

I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.

What's the meaning of logistic regression dataset labels?

I've learned the Logistic Regression for some days, and i think the logistic regression's dataset's labels needs to be 1 or 0, is it right ?
But when i lookup the libSVM library's regression dataset, i see the label values are continues number(e.g. 1.0086,1.0089 ...), did i miss something ?
Note that the libSVM library could be used for regression problem.
Thanks so much !
Contrary to its name, logistic regression is a classification algorithm and it outputs class probability conditioned on the data point. Therefore the training set labels need to be either 0 or 1. For the dataset you mentioned, logistic regression is not a suitable algorithm.
SVM is a classification algorithm and it uses the input labels -1 or 1. It is not a probabilistic algorithm and it doesn't output class probabilities. It also can be adapted to regression.
Are you using a 3rd party library or programming this yourself? Generally the labels are used as ground truth so you can see how effective your approach was.
For example if your algo is trying to predict what a particular instance is it might output -1, the ground truth label will be +1 which means you did not successfully classify that particular instance.
Note that "regression" is a general term. To say someone will perform regression analysis doesn't necessarily tell you what algorithm they will be using, nor all of the nature of the data sets. All it really tells you is that you have a set of samples with features which you want to use to predict a single outcome value (a model for conditional probability).
One major difference between logistic regression and linear regression is that the former is usually trained on categorical, binary-labeled sample sets; while the latter is trained on real-labeled (ℝ) sample sets.
Any time your labels are real valued, it means you're probably going to use linear regression or similar, or else convert those real valued labels to categorical labels (e.g. via thresholds or bins) if you want to in fact use logistic regression. There is potentially a big difference in the quality and interpretation of your results though, if you try to convert from one such problem setup to another.
See also Regression Analysis.

Resources