Combining multiple estimators in SKlearn for multilabel multiclass - machine-learning

I have been researching ensemble methods in sklearn: sklearn ensemble. I am trying to train several different estimators and combine the results, as in example 1.11.5.1 (near the bottom of the page, VotingClassifier). However, my problem is multiclass multilabel and this is not supported.
How can I combine the results from multiple different types of estimator, in order to classify multiclass multilabel data?
I have tried outputting probabilities for each label and then averaging, but the results are worse than the individual models. Thanks.

Related

Can we make a single-label prediction from multi-label features?

Is there any way to predict a single-label output using multi-label features?
I am now working with a document type prediction model.
Each document has at least one label and 7 different labels are used in labelling the data.
Given a series of documents, I am trying to predict the label for the current document based on labels of the previous documents.
I'd say this problem is a multi-class classification with multi-label features as I'm trying to make a machine give only 1 possible label for an unknown input.
I've tried both multi-class and multi-label classification on Scikit Learn. My impression is that we can only perform multi-label classification with multi-labelled data. Are there any Scikit Learn classifiers that can do multi-label --> single label predictions? If not, are there any other ways to do so?
You should try Simple Transformer Models and I am giving you a link where you explore
the different models related to multiclass and multilabel
https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model

How does RandomForestClassifier work for classification?

I have learned that Sklearn treats multi-class classification problems as a collection of binary problems. Quoting the Sklearn user guide:
In extending a binary metric to multiclass or multilabel problems, the data is treated as a collection of binary problems, one for each class.
So, binary classification models like LogisticRegression or Support vector matrices can support multi-class cases by using either One-vs-One or One-vs-Rest strategies. I wanted to know if that was the case for RandomForestClassifier too? How about other classifiers in Sklearn - are they all used as binary classifiers under the hood when dealing with a multi-class problem?
According to the documentation for Decision Trees, multi-output problems add a small change to the leaves of each tree in a random forest.
Suppose you have set criterion='gini'. In essence, each node is built by picking a subset of max_features features, calculating the average reduction in the gini impurity for all N classes and choosing the variable-threshold combination that reduces it most.
This means that random forests do not create one model for each class. Instead, it's only one model that simultaneously reduces the criterion metric for all classes in each node of every tree and predicts the most common class at each leaf.

LDA and PCA on a dataset containing two classes

I would like to compare the accuracies of running logistic regression on a dataset following PCA and LDA. The dataset I am using is the wisconsin cancer dataset, which contains two classes: malignant or benign tumors and 30 features. I have already conducted PCA on this data and have been able to get good accuracy scores with 10 PCAs. I know that LDA is similar to PCA. My understanding is that you calculate the mean vectors of each feature for each class, compute scatter matricies and then get the eigenvalues for the dataset. Is LDA similar to PCA in the sense that I can choose 10 LDA eigenvalues to better separate my data? I have tried LDA with scikit learn, however it has only given me one LDA back. Is this becasue I only have 2 classes, or do I need to do an addiontional step? I would like to have 10 LDAs in order to compare it with my 10 PCAs. Is this even possible?
Actually both LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised (ignores class labels). You can picture PCA as a technique that finds the directions of maximal variance.And LDA as a technique that also cares about class separability (note that here, LD 2 would be a very bad linear discriminant).Remember that LDA makes assumptions about normally distributed classes and equal class covariances (at least the multiclass version; the generalized version by Rao).

Loss function for OneVsRestClassifier

I have a OneVsRestClassifier (scikit-learn) which has been trained.
clf = OneVsRestClassifier(LogisticRegression(C=1.2, penalty='l1')).fit(X_train, y_train)
I want to find out the loss for my test data. I used log_loss function but it does not seem to work because I have multiple classes as outputs for each test case. What do I do?
The classification problem that you are referring to is known as a Multi-Label Classification problem. You have made a good decision of using the OneVsRestClassifier for this purpose. By default the score method uses the subset accuracy which is a very harsh metric as it requires you to guess the entire subset of labels correctly.
Some other loss functions, provided by scikit-learn, that you can use are as follows:
Hamming Loss - This measures the hamming distance between your prediction of labels and the true label. This is an intuitive formula to understand the hamming distance.
Jaccard Similarity Coefficient Score - This measures the Jaccard similarity between your predicted labels and the true labels.
Precision, Recall and F-Measures - In the case of multi-label classification, the notion of Precision, Recall and F-Measures can be applied to each class independently. The following guide explains how to combine them across all labels in multi-label classification.
If you need to also rank the labels as it is done in multi-label ranking problems, then there are other more advanced techniques available in scikit-learn which are very well documented with examples here. If you are dealing with this kind of a problem, then let me know in the comments, I will explain each of these metrics in more details.
Hope this helps!

Multiple-feature combination for support vector machines

I have two types of feature vectors for a dataset. Both types of the feature vectors could give an predicting accuracy about 90% by training a SVM.
To achieve higher accuracy, I plan to combine the two types of feature vectors.
My question is which of the two following strategies I should take:
Train one SVM for each type of feature vectors, and then combine the prediction results linearly.
Merge the two types of feature vectors into a longer one, and then train a SVM.
There's no way of telling which one will get you better accuracy. Simply try and see :)

Resources