How do i compare the performance of two multiclass confusion matrix? - comparison

So I have two multiclass confusion matrices from two different classifiers for ten classes.
Both used the same input data.
I want to know how I will compare the two to choose which is the better one.

Related

How can I tell a multiclass classifier that 2 categories are closely related and therefore misclassification between them should not be penalised?

I have a multiclass classification problem for an e-commerce website, with close to 2000 categories. Categories are across fashion, electronics, appliances etc., and some of these categories are closely related to each other. For example, consider the pairs:
[electric mixer, food processor]
[Lip gloss, lipstick] etc.
I am training a multiclass one-vs-all classifier for this. My question is, how do I pass on this information to the classifier that it is okay to misclassify among closely related pairs?
To do so you have to specify it when you write a custom loss function. This is the same philosophy as trying to apply different weights on classes.
I found this code that could work on your case (using keras):
https://github.com/keras-team/keras/issues/2115
If you don’t want to penalize this confusion at all, it is the same as combine them together as one class.
Why do you train one-vs-all classifier? In most of the cases it is better to use multi class classifier.
You can chain the results of the high level classifier (that predicts the category) with a few low level classifiers to predict the specific sub category)

Train multi-class classifier for binary classification

If a dataset contains multi categories, e.g. 0-class, 1-class and 2-class. Now the goal is to divide new samples into 0-class or non-0-class.
One can
combine 1,2-class into a unified non-0-class and train a binary classifier,
or train a multi-class classifier to do binary classification.
How is the performance of these two approaches?
I think more categories will bring about a more accurate discriminant surface, however the weights of 1- and 2- classes are both lower than non-0-class, resulting in less samples be judged as non-0-class.
Short answer: You would have to try both and see.
Why?: It would really depend on your data and the algorithm you use (just like for many other machine learning questions..)
For many classification algorithms (e.g. SVM, Logistic Regression), even if you want to do a multi-class classification, you would have to perform a one-vs-all classification, which means you would have to treat class 1 and class 2 as the same class. Therefore, there is no point running a multi-class scenario if you just need to separate out the 0.
For algorithms such as Neural Networks, where having multiple output classes is more natural, I think training a multi-class classifier might be more beneficial if your classes 0, 1 and 2 are very distinct. However, this means you would have to choose a more complex algorithm to fit all three. But the fit would possibly be nicer. Therefore, as already mentioned, you would really have to try both approaches and use a good metric to evaluate the performance (e.g. confusion matrices, F-score, etc..)
I hope this is somewhat helpful.

How to choose classifier on specific dataset

When given the dataset, normally m instances by n features matrix, how to choose the classifier that is most appropriate for the dataset.
This is just like what algorithm to solve a prime Number. Not every algorithm solve any problem means each problem assigned which finite no. of algorithm. In machine learning you can apply different algorithm on a type of problem.
If matrix contain real numbered features then you can use KNN algorithm can be used. Or if matrix have words as feature then you can use naive bayes classifier which is one of best for text classification. And Machine learning have tons of algorithm you can read them apply to your problem which fits best. Hope you understand what I said.
An interesting but much more general map I found:
http://scikit-learn.org/stable/tutorial/machine_learning_map/
If you have weka, you can use experimenter and choose different algorithms on same data set to evaluate different models.
This project compares many different classifiers on different typical datasets.
If you have no idea, you could use this simple tool auto-weka which will test all the different classifiers you selected within different constraints. Before using auto-weka, you may need to convert your data to ARFF using Weka or just manually (many tutorial on youtube).
The best classifier depends on your data (binary/string/real/tags, patterns, distribution...), what kind of output to predict (binary class / multi-class / evolving classes / a value from regression ?) and the expected performance (time, memory, accuracy). It would also depend on whether you want to update your model frequently or not (ie. if it is a stream, better use an online classifier).
Please note that the best classifier may not be one but an ensemble of different classifiers.

Late fusion step of classification using libLinear

I am doing a classification work that use libLinear as kernel these days.
And have trained two type of feature sets into two models to do prediction for a query input.
Wish to utilize Late Fusion to combine two result from models, I change the code of liblinear that I can get the decision score for different classes. So we got two sets of score to determine which class the query should be in.
Is there any standard way to do this "Late Fusion" or just intuitively add two scores of each classes and choose the class with highest score as candidate?
The standard way to combine multiple classifiers would be a weighted sum of the scores of the individual classifiers. Of course, you then have the problem of specifying the weight coefficients. There are different possibilities:
set weights uniformly
set weights proportional to performance of classifier
train a new classifier which takes the scores as input

Multiple-feature combination for support vector machines

I have two types of feature vectors for a dataset. Both types of the feature vectors could give an predicting accuracy about 90% by training a SVM.
To achieve higher accuracy, I plan to combine the two types of feature vectors.
My question is which of the two following strategies I should take:
Train one SVM for each type of feature vectors, and then combine the prediction results linearly.
Merge the two types of feature vectors into a longer one, and then train a SVM.
There's no way of telling which one will get you better accuracy. Simply try and see :)

Resources