Multi-class classification using LIBLINEAR - machine-learning

I am a beginner to SVM which i have successfully implemented one-class classification.Now i want to know about multi-class classification which am very much confused about.
I went through How to do multi class classification using Support Vector Machines (SVM) which i want the exact same output but the link does not have a specific example using windows.If anyone can help me out with an example in windows for both “ONE-AGAINST-ONE”,”ONE-AGAINST-ALL” methods of multi-class classification
Thanks

Using libLinear you will not be able to have a similar output because it cannot predict probabilities. You should use libSVM for that.
LibLinear does not support multi-class classification by default, but you can download this tool from the official site and it can do the job.
If you want multi-class probability estimate, you can take a look at this tool

I think the other answer is talking about multi-label classification or something because liblinear does support multi-class classification by default (Choose -s to be between 0 and 7, there are 8 different modes https://github.com/cjlin1/liblinear/blob/master/README)
For the input, you can use the same as binary classification, but just set the label to be the index of the class (between 0 and (# of classes - 1)) instead of ±1.

Related

Can linear classification take non binary targets?

I'm following a TensorFlow example that takes a bunch of features (real estate related) and "expensive" (ie house price) as the binary target.
I was wondering if the target could take more than just a 0 or 1. Let's say, 0 (not expensive), 1 (expensive), 3 (very expensive).
I don't think this is possible as the logistic regression model has asymptotes nearing 0 and 1.
This might be a stupid question, but I'm totally new to ML.
I think I found the answer myself. From Wikipedia:
First, the conditional distribution y|x is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary. Second, the predicted values are probabilities and are therefore restricted to (0,1) through the logistic distribution function because logistic regression predicts the probability of particular outcomes.
Logistic Regression is defined for binary classification tasks.(For more details, please logistic_regression. For multi-class classification problems, you can use Softmax Classification algorithm. Following tutorials shows how to write a Softmax Classifier in Tensorflow Library.
Softmax_Regression in Tensorflow
However, your data set is linearly non-separable (most of the time this is the case in real-world datasets) you have to use an algorithm which can handle nonlinear decision boundaries. Algorithm such as Neural Network or SVM with Kernels would be a good choice. Following IPython notebook shows how to create a simple Neural Network in Tensorflow.
Neural Network in Tensorflow
Good Luck!

Logistic Regression only recognizing predominant classes

I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.

Machine Learning Text Classification technique

I am new to Machine Learning.I am working on a project where the machine learning concept need to be applied.
Problem Statement:
I have large number(say 3000)key words.These need to be classified into seven fixed categories.Each category is having training data(sample keywords).I need to come with a algorithm, when a new keyword is passed to that,it should predict to which category this key word belongs to.
I am not aware of which text classification technique need to applied for this.do we have any tools that can be used.
Please help.
Thanks in advance.
This comes under linear classification. You can use naive-bayes classifier for this. Most of the ml frameworks will have an implementation for naive-bayes. ex: mahout
Yes, I would also suggest to use Naive Bayes, which is more or less the baseline classification algorithm here. On the other hand, there are obviously many other algorithms. Random forests and Support Vector Machines come to mind. See http://machinelearningmastery.com/use-random-forest-testing-179-classifiers-121-datasets/ If you use a standard toolkit, such as Weka, Rapidminer, etc. these algorithms should be available. There is also OpenNLP for Java, which comes with a maximum entropy classifier.
You could use the Word2Vec Word Cosine distance between descriptions of each your category and keywords in the dataset and then simple match each keyword to a category with the closest distance
Alternatively, you could create a training dataset from already matched to category, keywords and use any ML classifier, for example, based on artificial neural networks by using vectors of keywords Cosine distances to each category as an input to your model. But it could require a big quantity of data for training to reach good accuracy. For example, the MNIST dataset contains 70000 of the samples and it allowed me reach 99,62% model's cross validation accuracy with a simple CNN, for another dataset with only 2000 samples I was able reached only about 90% accuracy
There are many classification algorithms. Your example looks to be a text classification problems - some good classifiers to try out would be SVM and naive bayes. For SVM, liblinear and libshorttext classifiers are good options (and have been used in many industrial applcitions):
liblinear: https://www.csie.ntu.edu.tw/~cjlin/liblinear/
libshorttext:https://www.csie.ntu.edu.tw/~cjlin/libshorttext/
They are also included with ML tools such as scikit-learna and WEKA.
With classifiers, it is still some operation to build and validate a pratically useful classifier. One of the challenges is to mix
discrete (boolean and enumerable)
and continuous ('numbers')
predictive variables seamlessly. Some algorithmic preprocessing is generally necessary.
Neural networks do offer the possibility of using both types of variables. However, they require skilled data scientists to yield good results. A straight-forward option is to use an online classifier web service like Insight Classifiers to build and validate a classifier in one go. N-fold cross validation is being used there.
You can represent the presence or absence of each word in a separate column. The outcome variable is desired category.

Multi-label prediction using LIBLINEAR

I am using LIBLINEAR and i need to know whether Multi-Label Prediction in windows is possible or not.I tried google but no luck
I want the output to be produced the following way
I train some 10 documents with three class labels 1,2,3 and now when i feed a test document to the classifier and if the document belongs to label 1 and 2 then it should produce 1,2 or something else which shows that document belongs to 1 and 2 both the class labels
I want an example in windows
Thanks
By default neither libSvm nor LibLinear supports multiclass classification.
You need to perform a One against all approach.
You can find help on the libsvm page which provides different tools for multi-label classification that are based on LIBSVM or LIBLINEAR

Difference, Binary vs multi-class classification

Is there any difference in the implementation between binary and multi-class classification?
Certainly -- a binary classifier does not automatically help in performing multi-class classification since "multi" might be > 2.
A standard technique to fake N-class with a binary classifier is to build N binary classifiers for each of the labels and then see which of the N binary classifiers is most confident in its class, and choose that. So, that much at least is different in this case.
There are also algorithms that more directly support multi-class, like random decision forests. Since 2 is a special case of multi, no, there would be no difference in applying RDF to a binary vs multi-class problem.

Resources