Can we make a single-label prediction from multi-label features? - machine-learning

Is there any way to predict a single-label output using multi-label features?
I am now working with a document type prediction model.
Each document has at least one label and 7 different labels are used in labelling the data.
Given a series of documents, I am trying to predict the label for the current document based on labels of the previous documents.
I'd say this problem is a multi-class classification with multi-label features as I'm trying to make a machine give only 1 possible label for an unknown input.
I've tried both multi-class and multi-label classification on Scikit Learn. My impression is that we can only perform multi-label classification with multi-labelled data. Are there any Scikit Learn classifiers that can do multi-label --> single label predictions? If not, are there any other ways to do so?

You should try Simple Transformer Models and I am giving you a link where you explore
the different models related to multiclass and multilabel
https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model

Related

Encode my multiclass classification problem for ordinal NN

I want to encode my multiclass classification output variable in a specific way to take ordinality into account. I want to use this in a NN with sigmoid objective.
I have a couple of questions about this:
How could I encode my classes in this way?
This would not change the problem from multiclass to multilabel classification right?
P.S. here is a link to the paper I based this on. And here is a figure representing the change from a normal NN to their addaptation:
1. How could I encode my classes in this way?
Depends on the framework, a pytorch example can be found here, which also includes a code snippet for converting from predictions and back to labels
This would not change the problem from multiclass to multilabel classification right?
No, you would have multiple binary outputs, but they are subsequently converted to a single label, thus it is still multiclass classification.

Combining multiple estimators in SKlearn for multilabel multiclass

I have been researching ensemble methods in sklearn: sklearn ensemble. I am trying to train several different estimators and combine the results, as in example 1.11.5.1 (near the bottom of the page, VotingClassifier). However, my problem is multiclass multilabel and this is not supported.
How can I combine the results from multiple different types of estimator, in order to classify multiclass multilabel data?
I have tried outputting probabilities for each label and then averaging, but the results are worse than the individual models. Thanks.

Text Classification: Multilable Text Classification vs Multiclass Text Classification

I have a question about the approach to deal with a multilabel classification problem.
Based on literature review, I found one most commonly-used approach is Problem Transformation Approach. It transformed the multilabel problem to a number of single label problems, and the classification result is just the simple union of each single label classifier, using the binary relevant approach.
Since a single label problem can be catergorized as either binary classification (if there are two labels) or multiclass classification problem (if there are multiple labels i.e., labels>2), the current transformation approach seems all transform the multilabel problem to a number of binary problems. But this would be cause the data imbalance issue, because of the negative class may have much more documents than the positive class.
So my question, why not transform to a number of multiclass problems, and then apply the direct multiclass classification algorithms to avoid the data imbalance problem. In this case, for one testing document, each trained single label multiclass classifier would predict whether to assign the label, and the union of all such single label multiclass classifier prediction results would be the final set of labels for that testing documents.
In summary, compared to transform a multilabel classification problem to a number of binary classification problems, transform a multilabel classification problem to a number of multiclass classification problems could avoid the data imbalance problem. Other than this, everything stays the same for the above two methods: you need to construct |L|(|L| means the total number of different labels in the classification problem) single label (either binary or multiclass) classifier, you need to prepare |L| sets of training data and testing data, you need to test each single label classifier on the testing document and the union of prediction results of each single label classifier is the final label set for the testing document.
Hope anyone could help clarify my confusion, thanks very much!
what you describe is a known transformation strategy to multi-class problems called Label Power Set Transformation Strategy.
Drawbacks of this method:
The LP transformation may lead to up to 2^|L| transformed
labels.
Class imbalance problem.
Refer to:
Cherman, Everton Alvares, Maria Carolina Monard, and Jean Metz. "Multi-label problem transformation methods: a case study." CLEI Electronic Journal 14.1 (2011): 4-4.

Logistic Regression only recognizing predominant classes

I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.

Multi-label prediction using LIBLINEAR

I am using LIBLINEAR and i need to know whether Multi-Label Prediction in windows is possible or not.I tried google but no luck
I want the output to be produced the following way
I train some 10 documents with three class labels 1,2,3 and now when i feed a test document to the classifier and if the document belongs to label 1 and 2 then it should produce 1,2 or something else which shows that document belongs to 1 and 2 both the class labels
I want an example in windows
Thanks
By default neither libSvm nor LibLinear supports multiclass classification.
You need to perform a One against all approach.
You can find help on the libsvm page which provides different tools for multi-label classification that are based on LIBSVM or LIBLINEAR

Resources