Multi-label classification involving range of numbers as labels - machine-learning

I have a classification problem where my labels are ratings, 0 - 100, with increments of 1 (e.g. 1, 2, 3, 4,).
I have a data set where each row has a name, text corpus, and a rating (0 - 100).
From the text corpus I am trying to extract features that I can feed into my classifier, which will output a corresponding rating per row (0 - 100).
For feature selection, I am thinking of starting with basic bag of words. My question lies in the classification algorithm, however. Is there a classification algorithm in sci-kit learn that supports this kind of problem?
I was reading http://scikit-learn.org/stable/modules/multiclass.html, but the algorithms described seem to support labels that are completely discrete, whereas I have a set of continuous labels.
EDIT: What about the case where I bin my ratings? For example, I can have 10 labels, each 1- 10.

You can use multi-variate regression instead of classification. U can cluster the n-gram features from text corpus to form a dictionary and use it to form a feature set. With this feature set, train a regression model where output can be continuous values. U can round the output real number to get a discrete label in 1-100

You can preprocess your data with OneHotEncoder to convert your one 1-to-100 feature into 100 binary features corresponding to each value of interval [1..100]. Then you'll have 100 labels and learn a multiclass classifier.
Though, I suggest to use Regression instead.

Related

What does the ranker in Weka PCA tell us about feature selection?

I have a data set that is 31000 rows with 13 attributes. But because most are categorical I had to use NominalToBinary for those attributes so the attributes grew to 61.
I have sampled the data to 18000 rows and applied the PCA with ranker in Weka. centerData is false so it should normalise it for me.
This is my result:
0.945 1 -0.367Marial_Status= Married-civ-spouse-0.365Relationship= Husband+0.298Marial_Status= Never-married+0.244Age=0_23+0.232Gender= Female...
I understand that the ranking is the variance. So rank 1 is 94.5%? Now the issue I have with feature selecting is how do i know which ones to keep? Most of these attributes are categorical and changed to numeric for the PCA. So with the original data-set with both categorical and numeric, with respects to this output what is it saying about feature selecting?
PCA assumes numerical data. If you binary encode you categorical variables you basically take a hammer and make you data fit your models assumption.
Another way to deal with categorical features are non-linear feature transformations which will find a way to represent distances between categories in a suitable way. A quick google search provided Categorical Principal Components Analysis (CTPCA) for me. Maybe have a look at this.

Multiple binary classifiers combining

I'm trying to implement a multi layer perceptron classifier, and I have a data set of 1000 sample. There are 6 features and 5 possible different labels
Based on my understanding for OneVsAll, we create a binary classifier per label and train the classifier with the training data.
However, I don't understand how we can combine the results of the 5 binary classifiers. Also, what if the data was noisy and 2 binary classifiers predicted that the test sample was positive? and what we do if all labels binary classifiers predicted that a sample is a negative sample, then how do we label it?
Your output layer, each unit should be returning a value of h where 0 < h < 1. Usually, in a binary classifier, you would choose a threshold value, say 0.5 where you decide whether your output is a positive or negative result. In the case of 1vsAll, you choose the label for the output units with the highest value of h as your predicted label.

Using LSTM for binary classification

I have time series data of size 100000*5. 100000 samples and five variables.I have labeled each 100000 samples as either 0 or 1. i.e. binary classification.
I want to train it using LSTM , because of the time series nature of data.I have seen examples of LSTM for time series prediction, Is it suitable to use it in my case.
Not sure about your needs.
LSTM is best suited for sequence models, like time series you said, and your description don't look a time series.
Any way, you may use LSTM for time series, not for prediction, but for classification like this article.
In my experience, for binary classification having only 5 features you could find better methods, will consume more memory thant other methods, and could get worst results.
First of all, you can see it from a different perspective, i.e. instead of having 10,000 labeled samples of 5 variables, you should treat it as 10,000 unlabeled samples of 6 variables, where the 6th variable is the label.
Therefore, you can train your LSTM as a multivariate predictor for your 6th variable, that is the sample label and compare with the ground truth during testing to evaluate its performance.

labelling of dataset in machine learning

I have a question about some basic concepts of machine learning. The examples, I observed, were giving a brief overview .For training the system, feature vector is given as input. In case of supervised learning, the dataset is labelled. I have confusion about labelling. For example if I have to distinguish between two types of pictures, I will provide a feature vector and on output side for testing, I'll provide 1 for type A and 2 for type B. But if I want to extract a region of interest from a dataset of images. How will I label my data to extract ROI using SVM. I hope I am able to convey my confusion. Thanks in anticipation.
In supervised learning, such as SVMs, the dataset should be composed as follows:
<i-th feature vector><i-th label>
where i goes from 1 to the number of patterns (also examples or observations) in your training set so this represents a single record in your training set which can be used to train the SVM classifier.
So you basically have a set composed by such tuples and if you do have just 2 labels (binary classification problem) you can easily use a SVM. Indeed the SVM model will be trained thanks to the training set and the training labels and once the training phase has finished you can use another set (called Validation Set or Test Set), which is structured in the same way as the training set, to test the accuracy of your SVMs.
In other words the SVM workflow should be structured as follows:
train the SVM using the training set and the training labels
predict the labels for the validation set using the model trained in the previous step
if you know what the actual validation labels are, you can match the predicted labels with the actual labels and check how many labels have been correctly predicted. The ratio between the number of correctly predicted labels and the total number of labels in the validation set returns a scalar between [0;1] and it's called the accuracy of your SVM model.
if you're interested in the ROI, you might want to check the trained SVM parameters (mainly the weights and bias) to reconstruct the separation hyperplane
It is also important to know that the training set records should be correctly, a priori labelled: if the training labels are not correct, the SVM will never be able to correctly predict the output for previously unseen patterns. You do not have to label your data according to the ROI you want to extract, the data must be correctly labelled a priori: the SVM will have the entire set of type A pictures and the set of type B pictures and will learn the decision boundary to separate pictures of type A and pictures of type B. You do not have to trick the labels: if you do, you're not doing classification and/or machine learning and/or pattern recognition. You're basically tricking the results.

How to do text classification with label probabilities?

I'm trying to solve a text classification problem for academic purpose. I need to classify the tweets into labels like "cloud" ,"cold", "dry", "hot", "humid", "hurricane", "ice", "rain", "snow", "storms", "wind" and "other". Each tweet in training data has probabilities against all the label. Say the message "Can already tell it's going to be a tough scoring day. It's as windy right now as it was yesterday afternoon." has 21% chance for being hot and 79% chance for wind. I have worked on the classification problems which predicts whether its wind or hot or others. But in this problem, each training data has probabilities against all the labels. I have previously used mahout naive bayes classifier which take a specific label for a given text to build model. How to convert these input probabilities for various labels as input to any classifier?
In a probabilistic setting, these probabilities reflect uncertainty about the class label of your training instance. This affects parameter learning in your classifier.
There's a natural way to incorporate this: in Naive Bayes, for instance, when estimating parameters in your models, instead of each word getting a count of one for the class to which the document belongs, it gets a count of probability. Thus documents with high probability of belonging to a class contribute more to that class's parameters. The situation is exactly equivalent to when learning a mixture of multinomials model using EM, where the probabilities you have are identical to the membership/indicator variables for your instances.
Alternatively, if your classifier were a neural net with softmax output, instead of the target output being a vector with a single [1] and lots of zeros, the target output becomes the probability vector you're supplied with.
I don't, unfortunately, know of any standard implementations that would allow you to incorporate these ideas.
If you want an off the shelf solution, you could use a learner the supports multiclass classification and instance weights. Let's say you have k classes with probabilities p_1, ..., p_k. For each input instance, create k new training instances with identical features, and with label 1, ..., k, and assign weights p_1, ..., p_k respectively.
Vowpal Wabbit is one such learner that supports multiclass classification with instance weights.

Resources