Does CvSVM scale the training data? - opencv

I am using OpenCV's CvSVM to learn SVM classifiers for 29 Classes.
The application is Face Recognition, and I divide the face image into 3x6 grid. For each block in the grid, I train a SVM classifier on the SURF features extracted from the block.
I read here http://www.csie.ntu.edu.tw/~cjlin/talks.html that it is very important to scale the training and testing data similarly.
Does CvSVM scale the data? If not, does OpenCV provide any function I can use to do the scaling?

there's a very good cross-platform library for utilizing machine learning algorithms in practice:
http://www.nickgillian.com/software/grt
I don't know whether OpenCV provides opportunity for scaling training/test data to achieve a better classification result, but with GRT you can do it:
https://github.com/nickgillian/grt/blob/master/GRT/ClassificationModules/SVM/SVM.h
...
#param bool useScaling: sets if the training/prediction data will be scaled to the default range of [-1. 1.]. The SVM algorithm commonly achieves a better classification result if scaling is turned on. The default useScaling value is useScaling=true
...
SVM(UINT kernelType = LINEAR_KERNEL,UINT svmType = C_SVC,bool useScaling = true,bool useNullRejection = false,bool useAutoGamma = true,double gamma = 0.1,UINT degree = 3,double coef0 = 0,double nu = 0.5,double C = 1,bool useCrossValidation = false,UINT kFoldValue = 10);
This is also a wrapper class to libsvm.

Related

Combining Neural Networks Pytorch

I have 2 images as input, x1 and x2 and try to use convolution as a similarity measure. The idea is that the learned weights substitute more traditional measure of similarity (cross correlation, NN, ...). Defining my forward function as follows:
def forward(self,x1,x2):
out_conv1a = self.conv1(x1)
out_conv2a = self.conv2(out_conv1a)
out_conv3a = self.conv3(out_conv2a)
out_conv1b = self.conv1(x2)
out_conv2b = self.conv2(out_conv1b)
out_conv3b = self.conv3(out_conv2b)
Now for the similarity measure:
out_cat = torch.cat([out_conv3a, out_conv3b],dim=1)
futher_conv = nn.Conv2d(out_cat)
My question is as follows:
1) Would Depthwise/Separable Convolutions as in the google paper yield any advantage over 2d convolution of the concatenated input. For that matter can convolution be a similarity measure, cross correlation and convolution are very similar.
2) It is my understanding that the groups=2 option in conv2d would provide 2 separate inputs to train weights with, in this case each of the previous networks weights. How are these combined afterwards?
For a basic concept see here.
Using a nn.Conv2d layer you assume weights are trainable parameters. However, if you want to filter one feature map with another, you can dive deeper and use torch.nn.functional.conv2d to explicitly define both input and filter yourself:
out = torch.nn.functional.conv2d(out_conv3a, out_conv3b)

Scaling the data in a decision tree changed my results?

I know that a decision tree doesn't get affected by scaling the data but when I scale the data within my decision tree it gives me a bad performance (bad recall, precision and accuracy)
But when I don't scale all the performance metrics the decision tree gives me an amazing result. How can this be?
Note: I use GridSearchCV but I don't think that the cross validation is the reason for my problem. Here is my code:
scaled = MinMaxScaler()
pca = PCA()
bestK = SelectKBest()
combined_transformers = FeatureUnion([ ("scale",scaled),("best", bestK),
("pca", pca)])
clf = tree.DecisionTreeClassifier(class_weight= "balanced")
pipeline = Pipeline([("features", combined_transformers), ("tree", clf)])
param_grid = dict(features__pca__n_components=[1, 2,3],
features__best__k=[1, 2,3],
tree__min_samples_split=[4,5],
tree__max_depth= [4,5],
)
grid_search = GridSearchCV(pipeline, param_grid=param_grid,scoring='f1')
grid_search.fit(features,labels)
With the scale function MinMaxScaler() my performance is:
f1 = 0.837209302326
recall = 1.0
precision = 0.72
accuracy = 0.948148148148
But without scaling:
f1 = 0.918918918919
recall = 0.944444444444
precision = 0.894736842105
accuracy = 0.977777777778
I am not familiar with scikit-learn, so excuse me if I misunderstand something.
First of all, does PCA standardize features? If it does not, it will give different results for scaled and non-scaled input.
Second, due to the randomness in splitting the samples, CV may give different results on each run. This will affect the results especially for small sample size. In addition, in case you have small sample size, the results may not be that different after all.
I have the following suggestions:
Scaling can be treated as an additional hyperparameter, which can be optimized by CV.
Perform an extra CV (called nested CV) or hold-out to estimate performance. This is done by keeping a test set, selecting your model using CV on the training data and then evaluate its performance on the test set (in case of nested CV you do this repeatedly for all folds and average the performance estimates). Of course, your final model should be trained on the whole dataset. In general, you should not use the performance estimate of the CV used for model selection, as it will be overly optimistic.

Variational Autoencoder for Feature Extraction

I would like to ask if would it be possible (rather if it can make any sense) to use a variational autoencoder for feature extraction. I ask because for the encoding part we sample from a distribution, and then it means that the same sample can have a different encoding (Due to the stochastic nature in the sampling process). Thanks!
Yes the feature extraction goal is the same for vae's or sparse autoencoders.
Once you have an encoder plug-in a classifier on the extracted features.
Best reggards,
Yes the output of encoder network can be used as your feature.
Just think about this: using the output of encoder network as input, the decoder network can generate you an image quite like your old image. Therefore the output of encoder network has pretty much covered most of the information in your original image. In other words, they are the most important features of your original image that distinguish it from other images.
The only thing you want to pay attention to is that variational autoencoder is a stochastic feature extractor, while usually the feature extractor is deterministic. You can either use the mean and variance as your extracted feature, or use Monte Carlo method by drawing from the Gaussian distribution defined by the mean and variance as "sampled extracted features".
Yes, you can.
I used the below code to extract the important features from my dataset.
prostate_df <- read.csv('your_data')
prostate_df <- prostate_df[,-1] # first column.
train_df<-prostate_df
outcome_name <- 'subtype' # my label column
feature_names <- setdiff(names(prostate_df), outcome_name)
library(h2o)
localH2O = h2o.init()
prostate.hex<-as.h2o(train_df, destination_frame="train.hex")
prostate.dl = h2o.deeplearning(x = feature_names,
#y="subtype",
training_frame = prostate.hex,
model_id = "AE100",
# input_dropout_ratio = 0.3, #Quite high,
#l2 = 1e-5, #Quite high
autoencoder = TRUE,
#validation_frame = prostate.hex,
#reproducible = T,seed=1,
hidden = c(1), epochs = 700,
#activation = "Tanh",
#activation ="TanhWithDropout",
activation ="Rectifier",
#activation ="RectifierWithDropout",
standardize = TRUE,
#regression_stop = -1,
#stopping_metric="MSE",
train_samples_per_iteration = 0,
variable_importances=TRUE
)
label1<-ncol(train_df)
train_supervised_features2 = h2o.deepfeatures(prostate.dl, prostate.hex, layer=1)
plotdata = as.data.frame(train_supervised_features2)
plotdata$label = as.character(as.vector(train_df[,label1]))
library(ggplot2)
qplot(DF.L1.C1, DF.L1.C2, data = plotdata, color = label, main = "Cancer Normal Pathway data ")
prostate.anon = h2o.anomaly(prostate.dl, prostate.hex, per_feature=FALSE)
head(prostate.anon)
err <- as.data.frame(prostate.anon)
h2o.scoreHistory(prostate.dl)
head(h2o.varimp(prostate.dl),10)
h2o.varimp_plot(prostate.dl)

How do you plot learning curves for Random Forest models?

Following Andrew Ng's machine learning course, I'd like to try his method of plotting learning curves (cost versus number of samples) in order to evaluate the need for additional data samples. However, with Random Forests I'm confused about how to plot a learning curve. Random Forests don't seem to have a basic cost function like, for example, linear regression so I'm not sure what exactly to use on the y axis.
You can use this function to plot learning curve of any general estimator (including random forest). Don't forget to correct the indentation.
import matplotlib.pyplot as plt
def learning_curves(estimator, data, features, target, train_sizes, cv):
train_sizes, train_scores, validation_scores = learning_curve(
estimator, data[features], data[target], train_sizes = train_sizes,
cv = cv, scoring = 'neg_mean_squared_error')
train_scores_mean = -train_scores.mean(axis = 1)
validation_scores_mean = -validation_scores.mean(axis = 1)
plt.plot(train_sizes, train_scores_mean, label = 'Training error')
plt.plot(train_sizes, validation_scores_mean, label = 'Validation error')
plt.ylabel('MSE', fontsize = 14)
plt.xlabel('Training set size', fontsize = 14)
title = 'Learning curves for a ' + str(estimator).split('(')[0] + ' model'
plt.title(title, fontsize = 18, y = 1.03)
plt.legend()
plt.ylim(0,40)
Plotting the learning curves using this function:
from sklearn.ensemble import RandomForestRegressor
plt.figure(figsize = (16,5))
model = RandomForestRegressor()
plt.subplot(1,2,i)
learning_curves(model, data, features, target, train_sizes, 5)
It might be possible that you're confusing a few categories here.
To begin with, in machine learning, the learning curve is defined as
Plots relating performance to experience.... Performance is the error rate or accuracy of the learning system, while experience may be the number of training examples used for learning or the number of iterations used in optimizing the system model parameters.
Both random forests and linear models can be used for regression or classification.
For regression, the cost is usually a function of the l2 norm (although sometimes the l1 norm) of the difference between the prediction and the signal.
For classification, the cost is usually mismatch or log loss.
The point is that it's not a question of whether the underlying mechanism is a linear model or a forest. You should decide what type of problem it is, and what's the cost function. After deciding that, plotting the learning curve is just a function of the signal and the predictions.

CvSVM does not train when svm_type is NU_SVC

I am extracting SURF feature descriptors from face images.
Num of Classes = 29 (So class_labels in trainingData are from 1-29)
Num of training images = 3000+
I want to use CvSVM::NU_SVC in OpenCV 2.4.8 to train SVM classifier on the 64-D feature vectors.
But when I call the "train" function, it returns "false". Basically, it doesnt do the training.
I followed a procedure very similar to
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
Am I missing anything?

Resources