Query in training - machine-learning

Am using libsvm for image classification in matlab. What does training_label_vector exactly mean in "svmtrain" command. What does testing_label_vector and testing_instance_matrix in "svmpredict". After training how to use the results.

For SVM, each example contains two parts: an input object (typically a vector containing the data) and a desired output value or label to specify which object/class it belongs to. Training Label vector basically represents the class the vector belongs to. For the two-class classification, the values for Training Label will be 1 or -1. So some of the features will be given a label as 1 as some as -1. This applies to Testing Label vector. Testing Instance matrix represents the data which you are trying to test the model with.
After training, the model will be outputted and you have to test with the testing matrix and testing labels to get the accuarcy of the classifier.
To read more on SVM, this is a good link: http://www.tristanfletcher.co.uk/SVM%20Explained.pdf

Related

Using probabilities from a model as features to another

I'd like to use the probability output from a model as features to another model.
For instance, I want to determine what kind of bird is on a picture, I want to use a CNN, train it and then use the probability result with other data, like size and weight from the bird, and feed it to a svm.
Do I need to use training and testing set for extracting these probabilities using the CNN? Should I devide my dataset into folds and then extract the probabilities for each different testing fold or can I just train and test on all my data and save the probabilities?
A test set is intended to validate your classifier reaches its goals, or alternatively to set hyper-parameters. In this case, you're not interested in the output of the CNN, as it's just an intermediate layer in the bigger picture.
Having said that, you're apparently not back-propagating SVM errors through its inputs. That's the consequence of a two-stage model. If you did, you'd be optimizing the CNN for use as input to that particular SVM.

Extract WHY label was chosen on classification?

I currently have a system set up where I train from old posts/categories and try to predict what category a new post will be. I am using a pipeline with TfidfVectorizer and LinearSVC to train the dataset and storing that in a pickle, then I process new posts by loading that pickle and using predict from the loaded pickle to classify the new posts. Currently, I am struggling with a few labels and I don't know why.
I am looking to provide some output on what words were triggered in the new post for each classification label so that I can see why a certain label was chosen when classifying new data against a training set, but I cannot find a way to do this.
I know that I can output the top features in my vectorizer when I am training, but how can I output essentially the reason why a certain label was chosen over another one?
During the training phase of the SVM for each word of the corpus vocabulary you learn a weight for each of the classes.
Then, during inference, you calculate the dot product between the class weights and the vector description of the instance to be classified. The algorithm returns the class that yields the highest dot product scores. Hence, you can have an estimate of how things work by examining those weights (coef_ attribute) for your instance.
I agree however that other methods like trees are more interpretable.

labelling of dataset in machine learning

I have a question about some basic concepts of machine learning. The examples, I observed, were giving a brief overview .For training the system, feature vector is given as input. In case of supervised learning, the dataset is labelled. I have confusion about labelling. For example if I have to distinguish between two types of pictures, I will provide a feature vector and on output side for testing, I'll provide 1 for type A and 2 for type B. But if I want to extract a region of interest from a dataset of images. How will I label my data to extract ROI using SVM. I hope I am able to convey my confusion. Thanks in anticipation.
In supervised learning, such as SVMs, the dataset should be composed as follows:
<i-th feature vector><i-th label>
where i goes from 1 to the number of patterns (also examples or observations) in your training set so this represents a single record in your training set which can be used to train the SVM classifier.
So you basically have a set composed by such tuples and if you do have just 2 labels (binary classification problem) you can easily use a SVM. Indeed the SVM model will be trained thanks to the training set and the training labels and once the training phase has finished you can use another set (called Validation Set or Test Set), which is structured in the same way as the training set, to test the accuracy of your SVMs.
In other words the SVM workflow should be structured as follows:
train the SVM using the training set and the training labels
predict the labels for the validation set using the model trained in the previous step
if you know what the actual validation labels are, you can match the predicted labels with the actual labels and check how many labels have been correctly predicted. The ratio between the number of correctly predicted labels and the total number of labels in the validation set returns a scalar between [0;1] and it's called the accuracy of your SVM model.
if you're interested in the ROI, you might want to check the trained SVM parameters (mainly the weights and bias) to reconstruct the separation hyperplane
It is also important to know that the training set records should be correctly, a priori labelled: if the training labels are not correct, the SVM will never be able to correctly predict the output for previously unseen patterns. You do not have to label your data according to the ROI you want to extract, the data must be correctly labelled a priori: the SVM will have the entire set of type A pictures and the set of type B pictures and will learn the decision boundary to separate pictures of type A and pictures of type B. You do not have to trick the labels: if you do, you're not doing classification and/or machine learning and/or pattern recognition. You're basically tricking the results.

Qualitative Classification in Neural Network on Weka

I have a training set where the input vectors are speed, acceleration and turn angle change. Output is a crisp class- an activity state from the given set {rest, walk, run}. e.g- say for input vectors [3.1 1.2 2]-->run ; [2.1 1 1]-->walk and so on.
I am using weka to develop a Neural Network model. The output I am defining as crisp ones (or rather qualitative ones in words- categorical values). After training the model, the model can fairly classify on test data.
I was wondering how the internal process (mapping function) is taking place? Is the qualitative output states are getting some nominal value inside the model and after processing it is again getting converted to the categorical data? because a NN model cannot map float input values to a categorical data through hidden neurons, so what is actually happening, although the model is working fine.
If the model converts the categorical outputs into nominal ones and then start processing then on what basis it converts the categorical value into some arbitrary numerical values?
Yes, categorical values are usually being converted to numbers, and the networks learn to associate input data with these numbers. However these numbers are often further encoded, not to use only single output neuron. The most common way to do it, for unordered labels, is to add dummy output neurons dedicated to each category and use 1-of-C encoding, with 0.1 and 0.9 as target values. Output is interpreted using the Winner-take-all paradigm.
Using only one neuron and encoding categories with different numbers for unordered labels often leads to problems - as the network will treat middle categories as "averages" of the boundary categories. This however may sometimes be desired, if you have ordered categorical data.
You can find very good explanation of this issue in this part of the online Neural Network FAQ.
The neural net's computations all take place on continuous values. To do multiclass classification with discrete output, its final layer produces a vector of such values, one for each class. To make a discrete class prediction, take the index of the maximum element in that vector.
So if the final layer in a classification network for four classes predicts [0 -1 2 1], then the third element of the vector is the largest and the third class is selected. Often, these values are also constrained to form a probability distribution by means of a softmax activation function.

How to do text classification with label probabilities?

I'm trying to solve a text classification problem for academic purpose. I need to classify the tweets into labels like "cloud" ,"cold", "dry", "hot", "humid", "hurricane", "ice", "rain", "snow", "storms", "wind" and "other". Each tweet in training data has probabilities against all the label. Say the message "Can already tell it's going to be a tough scoring day. It's as windy right now as it was yesterday afternoon." has 21% chance for being hot and 79% chance for wind. I have worked on the classification problems which predicts whether its wind or hot or others. But in this problem, each training data has probabilities against all the labels. I have previously used mahout naive bayes classifier which take a specific label for a given text to build model. How to convert these input probabilities for various labels as input to any classifier?
In a probabilistic setting, these probabilities reflect uncertainty about the class label of your training instance. This affects parameter learning in your classifier.
There's a natural way to incorporate this: in Naive Bayes, for instance, when estimating parameters in your models, instead of each word getting a count of one for the class to which the document belongs, it gets a count of probability. Thus documents with high probability of belonging to a class contribute more to that class's parameters. The situation is exactly equivalent to when learning a mixture of multinomials model using EM, where the probabilities you have are identical to the membership/indicator variables for your instances.
Alternatively, if your classifier were a neural net with softmax output, instead of the target output being a vector with a single [1] and lots of zeros, the target output becomes the probability vector you're supplied with.
I don't, unfortunately, know of any standard implementations that would allow you to incorporate these ideas.
If you want an off the shelf solution, you could use a learner the supports multiclass classification and instance weights. Let's say you have k classes with probabilities p_1, ..., p_k. For each input instance, create k new training instances with identical features, and with label 1, ..., k, and assign weights p_1, ..., p_k respectively.
Vowpal Wabbit is one such learner that supports multiclass classification with instance weights.

Resources