How to use svmpredict in matlab? - machine-learning

[predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model [, 'libsvm_options']);
1. I am using libsvm for image classification in matlab. What does
testing_label_vector, testing_instance_matrix, decision_values/prob_estimates, most importantly, accuracy in "svmpredict"
mean?
2. If I am using it for testing to obtain accuracy value, Do I have to
know the values for testing_label_vector?

(1)
testing_label: are the true labels of the data on which you want to test
testing_instance_matrix: is the data on which you want to test, one per row. The label of each data point is in testing_label.
decision_values: are the decision values
accuracy: is what percentage of the predicted labels agrees with the real labels
(2)
Yes. You certainly need ground truth to compute accuracy.

Related

Do I need to add ReLU function before last layer to predict a positive value?

I am developing a model using linear regression to predict the age. I know that the age is from 0 to 100 and it is a possible value. I used conv 1 x 1 in the last layer to predict the real value. Do I need to add a ReLU function after the output of convolution 1x1 to guarantee the predicted value is a positive value? Currently, I did not add ReLU and some predicted value becomes negative value like -0.02 -0.4…
There's no compelling reason to use an activation function for the output layer; typically you just want to use a reasonable/suitable loss function directly with the penultimate layer's output. Specifically, a RELU doesn't solve your problem (or at most only solves 'half' of it) since it can still predict above 100. In this case -predicting a continuous outcome- there's a few standard loss functions like squared error or L1-norm.
If you really want to use an activation function for this final layer and are concerned about always predicting within a bounded interval, you could always try scaling up the sigmoid function (to between 0 and 100). However, there's nothing special about sigmoid here - any bounded function, ex. any CDF of a signed, continuous random variable, could be similarly used. Though for optimization, something easily differentiable is important.
Why not start with something simple like squared-error loss? It's always possible to just 'clamp' out-of-range predictions to within [0-100] (we can give this a fancy name like 'doubly RELU') when you need to actually make predictions (as opposed to during training/testing), but if you're getting lots of such errors, the model might have more fundamental problems.
Even for a regression problem, it can be good (for optimisation) to use a sigmoid layer before the output (giving a prediction in the [0:1] range) followed by a denormalization (here if you think maximum age is 100, just multiply by 100)
This tip is explained in this fast.ai course.
I personally think these lessons are excellent.
You should use a sigmoid activation function, and then normalize the targets outputs to the [0, 1] range. This solves both issues of being positive and with a limit.
You can easily then denormalize the neural network outputs to get an output in the [0, 100] range.

Normalize Machine Learning Inputs

I have a set of inputs that has 5000ish features with values that vary from 0.005 to 9000000. Each of the features has similar values (a feature with a value of 10ish will not also have a value of 0.1ish)
I am trying to apply linear regression to this data set, however, the wide range of input values is inhibiting effective gradient descent.
What is the best way to handle this variance? If normalization is best, please include details on the best way to implement this normalization.
Thanks!
Simply perform it as a pre-processing step. You can do it as following:
1) Calculate mean values for each of the features in the training set and store it. Be careful, do not mess up feature mean and sample mean, so you will have a vector of size [number_of_features (5000ish)].
2) Calculate std. for each feature in the training set and store it. Size of [number_of_feature] as well
3) Update each training and testing entry as:
updated = (original_vector - mean_vector)/ std_vector
That's it!
The code will look like:
# train_data shape [train_length,5000]
# test_data [test_length, 5000]
mean = np.mean(train_data,1)
std = np.std(train_data,1)
normalized_train_data = (train_data - mean)/ std
normalized_test_data = (test_data - mean)/ std

SVM machine learning - How to define the target in the training set?

I am working on a project where I have to implement SVM machine learning algorithm. I am trying to predict the forearm movement intention. I am using accelometer (attached to my forearm) for measuring the angle change for x,y,z axes. I have never used machine before. The problem I am having is I do not exactly know how to structure the training set. I know the angle changes for each of the axis and I know i.e if x=45 degrees, y = 65 degrees, z=30 degrees gesture performed i performed is flexion. I would like to implement 3 gestures.So the data I am having is :
x y z Target
20 60 90 flexion
100 63 23 internal rotation
89 23 74 twist
.
.
.
.
I have a file with around 2000 entries. I know, I have to normalize the training set so the data are scaled. I would like to scale it so they are in range [0.9, 0.1]. The problem is that I do not know how to represent the target in my training set. Can I just use random numbers as 1 for flexion, 2 for internal rotation, 3 for twist??
Also once the training is completed, can I do the predictions based on values for x,y,z only?? without having to supply the target value. Is my understanding correct??
First of all, I suggest that you not scale or code your data. Leave it in human-readable form. Rather, write front-end routines to perform these tasks, and back-end routines to reverse the process. Also have internal routines that can display the data in the internal forms. Doing these up front will greatly enhance your debugging later on.
Yes, you will likely want to code your classifications as 1, 2, 3. Another possibility is to have a "one-hot" ordered triple: (1,0,0) or (0,1,0) or (0,0,1). However, most SVM algorithms are set up for scalar output. Also, note that the typical treatment for a multi-class algorithm is to run three separate SVM calculations, "one against all". For each class, you take that class as "plus" data and all the others as "minus" data.
Scaling data is important for regression convergence. If you're building your SVM via complete and direct computation of the support vectors, you don't need to scale numbers that are in compatible ranges, such as these. If you're doing it by some sort of iterative approximation, you still won't need it for this data -- but keep it in mind for the future.
Yes, prediction gives only the inputs: x, y, z. It will return the target classification. That's the purpose of supervised learning: summarize experience to classify the future.

Normalizing feature values for SVM

I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1)
Let's suppose I have 3 features with values in ranges of:
3 - 5.
0.02 - 0.05
10-15.
How do I convert all of those values into range of [0,1]?
What If, during training, the highest value of feature number 1 that I will encounter is 5 and after I begin to use my model on much bigger datasets, I will stumble upon values as high as 7? Then in the converted range, it would exceed 1...
How do I normalize values during training to account for the possibility of "values in the wild" exceeding the highest(or lowest) values the model "seen" during training? How will the model react to that and how I make it work properly when that happens?
Besides scaling to unit length method provided by Tim, standardization is most often used in machine learning field. Please note that when your test data comes, it makes more sense to use the mean value and standard deviation from your training samples to do this scaling. If you have a very large amount of training data, it is safe to assume they obey the normal distribution, so the possibility that new test data is out-of-range won't be that high. Refer to this post for more details.
You normalise a vector by converting it to a unit vector. This trains the SVM on the relative values of the features, not the magnitudes. The normalisation algorithm will work on vectors with any values.
To convert to a unit vector, divide each value by the length of the vector. For example, a vector of [4 0.02 12] has a length of 12.6491. The normalised vector is then [4/12.6491 0.02/12.6491 12/12.6491] = [0.316 0.0016 0.949].
If "in the wild" we encounter a vector of [400 2 1200] it will normalise to the same unit vector as above. The magnitudes of the features is "cancelled out" by the normalisation and we are left with relative values between 0 and 1.

Gaussian visible units in rbm

I want to implement Gaussian RBM.For that i want to make zero mean and unit variance of data.my data is MNIST dataset.The dataset has been taken and followed from the following link.
Visit http://www.cs.toronto.edu/~hinton/code/makebatches.m
so i implemented in below way.But my data become NAN.It becomes NAN after dividing the data with standard deviation.
for epoch = epoch:maxepoch,
fprintf(1,'epoch %d \r',epoch);
errsum=0;
for batch = 1:numbatches,
fprintf(1,'epoch %d batch %d \r',epoch,batch);
%START POSITIVE PHASE
data = batchdata(:,:,batch);
% zero mean and unit variance
data_mean = mean(data,1);
data=bsxfun(#minus,data,data_mean);
data_std = std(data1,[],1);
data=bsxfun(#rdivide,data,data_std);
I tried this with small set of examples.It works well.What will be the reason to become NAN.
How to get rid of this and make Gaussian input with zero mean and unit variance.
I would recommend normalizing the mean and variance of your data before starting the GBRBM training. This would you'd be able to check the batchdata variable manually in MATLAB workspace.
While training a GBRBM, I often see NaN as the training/validation error when my learning rate is too high. It should help to set the learning rate below or equal to 0.001.
You appear to be using an undefined variable "data1" in your "data_std = ..." code as opposed to "data".

Resources