OpenCV4Android SVM is not giving the correct prediction - opencv

I am new to machine learning and openCV. I have taken a set of 10 images for each emotion(neutral and happy) from Cohn-Kanade face database. Then I have extracted the facial features from each image and put them in my trainingData Matrix and assigned the label for the respective emotion (Example: 0 for neutral and 1 for happy).
I have used the RBF kernel with gamma = 0.1 and C = 1. Once trained, I am passing the facial features extracted from the live camera frames from a smartphone camera for prediction. The prediction always returns 1.
If I increase the number of training samples for neutral expression(example: 15 neutral expression images and 10 happy expression images), then the prediction always returns 0 and if there are equal number of images for each expression in the training samples, then SVM prediction always returns 1.
Why is the SVM behaving this way? How to check if I am using the right values for gamma and C? Also, does SVM depend on the resolution of training images and testing images?

I would request you to upload the SVM function so we can understand your code. Secondly, I have used SVM before and you need to normalize the training data and the labels. You should also make sure you are using the correct classifier as not all classifiers are supported. Follows this link for some tutorials http://docs.opencv.org/3.0-beta/modules/ml/doc/support_vector_machines.html
For answering your other questions, unfortunately you have to find the best combination for gamma and C yourself, which is kind of the drawback of SVM. https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-machine
Yes, the SVM does depend on the resolution as your features/feature vectors would change depending on the resolution and hence the inputs and the labels.
P.S. This should ideally be in comments but unfortunately i don't have enough points to do that.

Related

Machine Learning - Huge Only positive text dataset

I have a dataset with thousand of sentences belonging to a subject. I would like to know what would be best to create a classifier that will predict a text as "True" or "False" depending on whether they talk about that subject or not.
I've been using solutions with Weka (basic classifiers) and Tensorflow (neural network approaches).
I use string to word vector to preprocess the data.
Since there are no negative samples, I deal with a single class. I've tried one-class classifier (libSVM in Weka) but the number of false positives is so high I cannot use it.
I also tried adding negative samples but when the text to predict does not fall in the negative space, the classifiers I've tried (NB, CNN,...) tend to predict it as a false positive. I guess it's because of the sheer amount of positive samples
I'm open to discard ML as the tool to predict the new incoming data if necessary
Thanks for any help
I have eventually added data for the negative class and build a Multilineal Naive Bayes classifier which is doing the job as expected.
(the size of the data added is around one million samples :) )
My answer is based on the assumption that that adding of at least 100 negative samples for author’s dataset with 1000 positive samples is acceptable for the author of the question, since I have no answer for my question about it to the author yet
Since this case with detecting of specific topic is looks like particular case of topics classification I would recommend using classification approach with the two simple classes 1 class – your topic and another – all other topics for beginning
I succeeded with the same approach for face recognition task – at the beginning I built model with one output neuron with high level of output for face detection and low if no face detected
Nevertheless such approach gave me too low accuracy – less than 80%
But when I tried using 2 output neurons – 1 class for face presence on image and another if no face detected on the image, then it gave me more than 90% accuracy for MLP, even without using of CNN
The key point here is using of SoftMax function for the output layer. It gives significant increase of accuracy. From my experience, it increased accuracy of the MNIST dataset even for MLP from 92% up to 97% for the same model
About dataset. Majority of classification algorithms with a trainer, at least from my experience are more efficient with equal quantity of samples for each class in a training data set. In fact, if I have for 1 class less than 10% of average quantity for other classes it makes model almost useless for the detection of this class. So if you have 1000 samples for your topic, then I suggest creating 1000 samples with as many different topics as possible
Alternatively, if you don’t want to create a such big set of negative samples for your dataset, you can create a smaller set of negative samples for your dataset and use batch training with a size of batch = 2x your negative sample quantity. In order to do so, split your positive samples in n chunks with the size of each chunk ~ negative samples quantity and when train your NN by N batches for each iteration of training process with chunk[i] of positive samples and all your negative samples for each batch. Just be aware, that lower accuracy will be the price for this trade-off
Also, you could consider creation of more generic detector of topics – figure out all possible topics which can present in texts which your model should analyze, for example – 10 topics and create a training dataset with 1000 samples per each topic. It also can give higher accuracy
One more point about the dataset. The best practice is to train your model only with part of a dataset, for example – 80% and use the rest 20% for cross-validation. This cross-validation of unknown previously data for model will give you a good estimation of your model accuracy in real life, not for the training data set and allows to avoid overfitting issues
About building of model. I like doing it by "from simple to complex" approach. So I would suggest starting from simple MLP with SoftMax output and dataset with 1000 positive and 1000 negative samples. After reaching 80%-90% accuracy you can consider using CNN for your model, and also I would suggest increasing training dataset quantity, because deep learning algorithms are more efficient with bigger dataset
For text data you can use Spy EM.
The basic idea is to combine your positive set with a whole bunch of random samples, some of which you hold out. You initially treat all the random documents as the negative class, and train a classifier with your positive samples and these negative samples.
Now some of those random samples will actually be positive, and you can conservatively relabel any documents that are scored higher than the lowest scoring held out true positive samples.
Then you iterate this process until it stablizes.

opencv SVM Prediction

I am currently working on age estimation and extracting features using Gabor Filters. I have decided to classify ages using a SVM, the training is successful and it does not take a long time since the feature vectors are only about 3000 in dimensions by a 1000 training samples.
The problem is that when I want to predict an image, the result returned is always 18 (I pass the feature vector of the testing image to the predict function of the SVM). But when I predict an age from the training set the result is always correct. I do not know why this is happening. Any help will be appreciated.
Also, an observation is that whether or not I change and/ or include the SVMParams the prediction still outputs the same value.

Is Hog descriptors constructed in peopledetect.cpp?

I am new to hog, I am using opencv2.4.4 and visual studio 2010, i am running the sample peopledetect.cpp in the package and its compiling and running, but i want to understand the the source code in detail.In peopledetect.cpp is hog descriptors constructed/ already trained for peopledetection 3780 vectors are fed into svm classifier? when i try to debug the peopledetect.cpp i could only find HOGDescriptor creates hog descriptor and detector, i basically doesnt understand what this API does HOGDescriptor as i see peopledetect.cpp doesnt go through the steps of hog processing, it loads the already trained vectors to svm classifier to detect people/no people, am i wrong?. As there is no documentation about this.
Can anyone please brief about this.
The implementation of People Detection algorithm in opencv is based on HOG descriptors as features and SVM as classifier.
1. A training database (positives samples as person, negatives samples as non-person) is used to learn to SVM parameters (it computes and store the support vectors). Cross-validation is also perform (I assume) to optimize the soft margin parameter C and the kernel parameters (it could be linear kernel).
2. To detect people on testing video data, peopledetect.cpp loads the pre-learnt SVM, computes the HOG descriptors on different positions and scales, then merges the windows with high detection scores (outputs of binary SVM classifer).
Here is a good paper (inria) to start with.
Coming to more clearer answer, peopledetect.cpp goes through all the hog steps.
digging deeper i was more clear. Basically if you debug peopledetect.cpp goes through these steps.
Initially image is divided into several scales, scale0(1.05) is coefficient for detection window increase. For each scale of the image features are extracted from window and a classifier window is run, like above it follows scale-space pyramid method. So its pretty big computational process, very expensive, so opencv team has tried to parallelise for each scale.
I was baffled before why i was not able to debug/go through the steps, This parallel_for_(Range(0, (int)levelScale.size()),HOGInvoker()) creates several threads where each thread works on each scale, depends how much threads or created something like this.
because of this i was not able to debug, what i did was freeze all the threads and debug only the main thread. for different scales of the image hog processing steps are
Here in peopledetect.cpp hog and classifier window are kinda combined.In a single window(64x128) both feature extraction and running classifier takes place. After this is done for each scale of the image. There are a number of pedestrian windows of different scales are often associated with this region, this is grouped using grouprectangle() function
Training SVM consist to find parameters of the max margin between postive and negative samples.
if the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite?
Yes. These coefficient are extracted from training databases. You don't have them. SVM stores only support vectors which are sufficient to characterise the margin. See dual form of linear SVM for example.
a number of pedestrian windows of different scales are often associated with the region
True. A merging function is apply. Different methods (such groupRectangles(..)) are available (see here) and take in arguments parameters given to detectMultiScale(..).
What i understood from different papers is that feature extraction using hog is done using several positive and negative images, these features which were extracted is fed to Linear SVM to train them,So peopledetect.cpp uses this trained linear SVM sample, so This feature extraction process is not done by peopledetect.cpp i.e HOGDescriptor::getDefaultPeopleDetector() consists of coefficients of the classifier trained for people detection. The features extracted from hog detection/window(64x128)gives a total of length 3780(4 cells x 9 bins x 7 x 15 blocks = 3780) features. These features are then used to train a linear SVM classifier. If the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite? How do we get these co-efficients?
But The HOG descriptors are known to contain redundant information because of the different detection window sizes being used. So when the SVM classifier classifies a region as “pedestrian”, a number of pedestrian windows of different scales are often associated with the region. what peopledetect.cpp mainly does is (hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);) The detection window is scanned across the image at all positions and scales, and conventional non-maximum suppression is run on the output pyramid to detect object instances.

CvSVM.predict() gives 'NaN' output and low accuracy

I am using CvSVM to classify only two types of facial expression. I used LBP(Local Binary Pattern) based histogram to extract features from the images, and trained using cvSVM::train(data_mat,labels_mat,Mat(),Mat(),params), where,
data_mat is of size 200x3452, containing normalized(0-1) feature histogram of 200 samples in row major form, with 3452 features each(depends on number of neighbourhood points)
labels_mat is corresponding label matrix containing only two value 0 and 1.
The parameters are:
CvSVMParams params;
params.svm_type =CvSVM::C_SVC;
params.kernel_type =CvSVM::LINEAR;
params.C =0.01;
params.term_crit=cvTermCriteria(CV_TERMCRIT_ITER,(int)1e7,1e-7);
The problem is that:-
while testing I get very bad result (around 10%-30% accuracy), even after applying with different kernel and train_auto() function.
CvSVM::predict(test_data_mat,true) gives 'NaN' output
I will greatly appreciate any help with this, it's got me stumped.
I suppose, that your classes linearly hard/non-separable in feature space you use.
May be it will be better to apply PCA to your dataset before classifier training step
and estimate effective dimensionality of this problem.
Also I think it will be userful test your dataset with other classifiers.
You can adapt for this purpose standard opencv example points_classifier.cpp.
It includes a lot of different classifiers with similar interface you can play with.
The SVM generalization power is low.In the first reduce your data dimension by principal component analysis then change your SVM kerenl type to RBF.

Large Scale Image Classifier

I have a large set of plant images labeled with the botanical name. What would be the best algorithm to use to train on this dataset in order to classify an unlabel photo? The photos are processed so that 100% of the pixels contain the plant (e.g. either closeups of the leaves or bark), so there are no other objects/empty-space/background that the algorithm would have to filter out.
I've already tried generating SIFT features for all the photos and feeding these (feature,label) pairs to a LibLinear SVM, but the accuracy was a miserable 6%.
I also tried feeding this same data to a few Weka classifiers. The accuracy was a little better (25% with Logistic, 18% with IBk), but Weka's not designed for scalability (it loads everything into memory). Since the SIFT feature dataset is a several million rows, I could only test Weka with a random 3% slice, so it's probably not representative.
EDIT: Some sample images:
Normally, you would not train on the SIFT features directly. Cluster them (using k-means) and then train on the histogram of cluster membership identifiers (i.e., a k-dimensional vector, which counts, at position i, how many features were assigned to the i-th cluster).
This way, you obtain a single output per image (and a single, k-dimensional, feature vector).
Here's the quasi-code (using mahotas and milk in Pythonn):
from mahotas.surf import surf
from milk.unsupervised.kmeans import kmeans,assign_centroids
import milk
# First load your data:
images = ...
labels = ...
local_features = [surfs(im, 6, 4, 2) for im in imgs]
allfeatures = np.concatenate(local_features)
_, centroids = kmeans(allfeatures, k=100)
histograms = []
for ls in local_features:
hist = assign_centroids(ls, centroids, histogram=True)
histograms.append(hist)
cmatrix, _ = milk.nfoldcrossvalidation(histograms, labels)
print "Accuracy:", (100*cmatrix.trace())/cmatrix.sum()
This is a fairly hard problem.
You can give BoW model a try.
Basically, you extract SIFT features on all the images, then use K-means to cluster the features into visual words. After that, use the BoW vector to train you classifiers.
See the Wikipedia article above and the references papers in that for more details.
You probably need better alignment, and probably not more features. There is no way you can get acceptable performance unless you have correspondences. You need to know what points in one leaf correspond to points on another leaf. This is one of the "holy grail" problems in computer vision.
People have used shape context for this problem. You should probably look at this link. This paper describes the basic system behind leafsnap.
You can implement the BoW model according to this Bag-of-Features Descriptor on SIFT Features with OpenCV. It is a very good tutorial to implement the BoW model in OpenCV.

Resources