Encode my multiclass classification problem for ordinal NN - machine-learning

I want to encode my multiclass classification output variable in a specific way to take ordinality into account. I want to use this in a NN with sigmoid objective.
I have a couple of questions about this:
How could I encode my classes in this way?
This would not change the problem from multiclass to multilabel classification right?
P.S. here is a link to the paper I based this on. And here is a figure representing the change from a normal NN to their addaptation:

1. How could I encode my classes in this way?
Depends on the framework, a pytorch example can be found here, which also includes a code snippet for converting from predictions and back to labels
This would not change the problem from multiclass to multilabel classification right?
No, you would have multiple binary outputs, but they are subsequently converted to a single label, thus it is still multiclass classification.

Related

Can we make a single-label prediction from multi-label features?

Is there any way to predict a single-label output using multi-label features?
I am now working with a document type prediction model.
Each document has at least one label and 7 different labels are used in labelling the data.
Given a series of documents, I am trying to predict the label for the current document based on labels of the previous documents.
I'd say this problem is a multi-class classification with multi-label features as I'm trying to make a machine give only 1 possible label for an unknown input.
I've tried both multi-class and multi-label classification on Scikit Learn. My impression is that we can only perform multi-label classification with multi-labelled data. Are there any Scikit Learn classifiers that can do multi-label --> single label predictions? If not, are there any other ways to do so?
You should try Simple Transformer Models and I am giving you a link where you explore
the different models related to multiclass and multilabel
https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model

Predicting over data that has categorical, numerical and text

I am trying to build a classifier for my dataset. Each observation in the data has categorical and numerical values, as well as a more general description in free-text. I understand how to build a boosting algorithm to handle the categorical and numerical values, and I have already trained a neural network that predicted over the text quite succesfully. What I'm wrapping my head around is how to integrate both approaches?
Embed your free text using a Language Model (e.g. averaging fasttext wordembeddings, or using google-universal-sentence-encoder) into an N-dim vector of floats. One hot encode the categorical stuff. Concatenate [embedding, one_hot_encoding, numericals] and badabing badaboom, you've got yourself 1 vector representing your datapoint.
Tensorflow hub's KerasLayer + https://tfhub.dev/google/universal-sentence-encoder/4 is def a good starting point. I you need to train something yourself, you could look into tf.keras.layers.Embedding.

Text Classification: Multilable Text Classification vs Multiclass Text Classification

I have a question about the approach to deal with a multilabel classification problem.
Based on literature review, I found one most commonly-used approach is Problem Transformation Approach. It transformed the multilabel problem to a number of single label problems, and the classification result is just the simple union of each single label classifier, using the binary relevant approach.
Since a single label problem can be catergorized as either binary classification (if there are two labels) or multiclass classification problem (if there are multiple labels i.e., labels>2), the current transformation approach seems all transform the multilabel problem to a number of binary problems. But this would be cause the data imbalance issue, because of the negative class may have much more documents than the positive class.
So my question, why not transform to a number of multiclass problems, and then apply the direct multiclass classification algorithms to avoid the data imbalance problem. In this case, for one testing document, each trained single label multiclass classifier would predict whether to assign the label, and the union of all such single label multiclass classifier prediction results would be the final set of labels for that testing documents.
In summary, compared to transform a multilabel classification problem to a number of binary classification problems, transform a multilabel classification problem to a number of multiclass classification problems could avoid the data imbalance problem. Other than this, everything stays the same for the above two methods: you need to construct |L|(|L| means the total number of different labels in the classification problem) single label (either binary or multiclass) classifier, you need to prepare |L| sets of training data and testing data, you need to test each single label classifier on the testing document and the union of prediction results of each single label classifier is the final label set for the testing document.
Hope anyone could help clarify my confusion, thanks very much!
what you describe is a known transformation strategy to multi-class problems called Label Power Set Transformation Strategy.
Drawbacks of this method:
The LP transformation may lead to up to 2^|L| transformed
labels.
Class imbalance problem.
Refer to:
Cherman, Everton Alvares, Maria Carolina Monard, and Jean Metz. "Multi-label problem transformation methods: a case study." CLEI Electronic Journal 14.1 (2011): 4-4.

Logistic Regression only recognizing predominant classes

I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.

What's the meaning of logistic regression dataset labels?

I've learned the Logistic Regression for some days, and i think the logistic regression's dataset's labels needs to be 1 or 0, is it right ?
But when i lookup the libSVM library's regression dataset, i see the label values are continues number(e.g. 1.0086,1.0089 ...), did i miss something ?
Note that the libSVM library could be used for regression problem.
Thanks so much !
Contrary to its name, logistic regression is a classification algorithm and it outputs class probability conditioned on the data point. Therefore the training set labels need to be either 0 or 1. For the dataset you mentioned, logistic regression is not a suitable algorithm.
SVM is a classification algorithm and it uses the input labels -1 or 1. It is not a probabilistic algorithm and it doesn't output class probabilities. It also can be adapted to regression.
Are you using a 3rd party library or programming this yourself? Generally the labels are used as ground truth so you can see how effective your approach was.
For example if your algo is trying to predict what a particular instance is it might output -1, the ground truth label will be +1 which means you did not successfully classify that particular instance.
Note that "regression" is a general term. To say someone will perform regression analysis doesn't necessarily tell you what algorithm they will be using, nor all of the nature of the data sets. All it really tells you is that you have a set of samples with features which you want to use to predict a single outcome value (a model for conditional probability).
One major difference between logistic regression and linear regression is that the former is usually trained on categorical, binary-labeled sample sets; while the latter is trained on real-labeled (ℝ) sample sets.
Any time your labels are real valued, it means you're probably going to use linear regression or similar, or else convert those real valued labels to categorical labels (e.g. via thresholds or bins) if you want to in fact use logistic regression. There is potentially a big difference in the quality and interpretation of your results though, if you try to convert from one such problem setup to another.
See also Regression Analysis.

Resources