SVM for HOG descriptors in opencv - opencv

I am trying to classify the yard digits on the football field. I am able to detect them (different method) well. I have a minimal bounding box drawn around the tens place digits '1,2,3,4,5'. My goal is to classify them.
Ive been trying to train an SVM classifier on hog features I extract from the training set. A small subset of my training digits are here:
While training, I visualize my hog descriptors and they look correct. I use a 64X128 training window and other default parameters that OPencv's HOGDescriptor uses.
Once I train my images (50 samples per class, 5 classes), I have a 250X3780 training vector and 1X250 label vector which holds the class label values which I feed to a CvSVM object. Here is where I have a problem.
I tried using the default CvSVMParams() while using CvSVM. Terrible performance when tested on the training set itself!
I tried customizing my CvSVMPARAMS doing this:
CvSVMParams params = CvSVMParams();
params.svm_type = CvSVM::EPS_SVR;
params.kernel_type = CvSVM::POLY;
params.C = 1; params.p = 0.5; = 1;
and different variations of these parameters and my SVM classifier is terribly even when I test on the training set!
Can somebody help me out with parameterizing my SVM for this 5 class classifier?
I don't understand which kernel and what svm type I must use for this problem. Also, how in the world am I supposed to find out the values of c, p, degree for my svm?
I would assume this is an extremely easy classification problem since all my objects are nicely bounded in a box, fairly good resolution, and the classes i.e.: the digits 1,2,3,4,5 are fairly unique in appearance. I don't understand why my SVM is doing so poorly. What am I missing here?

A priori and without experimentation, it's very hard to give you some good parameters but I can give you some ideas.
First, you want to model a multi class classifier but you are using a regression algorithm, not that you can't do that but usually is easier if you start with C-SVM first.
Second, I would recommend to use RBF instead of a Polynomial kernel. Poly is very hard to get it right and usually RBF would do a better job out of the box.
Third, I would play with several values of C, don't be shy and try a bigger C (such as 100) which would force the algorithm to pick more SVs. It can lead to overfitting but if you can't even make the algorithm to learn the training set that's not your immediate problem.
Fourth, I would reduce the dimension of the images at first and then if needed, when you have a more stable model, you could try with that dimension again.
I really recommend you to read LibSVM guide which is very easy to follow
Hope it helps!
I forgot to mention, that a good way to pick parameters for SVM is to perform cross-validation:
I know is silly because it's on the title of the question, but I didn't realize you were using HOG descriptors until you pointed out on the comments.


Estimating parameters in multivariate classification

Newbie here typesetting my question, so excuse me if this don't work.
I am trying to give a bayesian classifier for a multivariate classification problem where input is assumed to have multivariate normal distribution. I choose to use a discriminant function defined as log(likelihood * prior).
However, from the distribution,
$${f(x \mid\mu,\Sigma) = (2\pi)^{-Nd/2}\det(\Sigma)^{-N/2}exp[(-1/2)(x-\mu)'\Sigma^{-1}(x-\mu)]}$$
i encounter a term -log(det($S_i$)), where $S_i$ is my sample covariance matrix for a specific class i. Since my input actually represents a square image data, my $S_i$ discovers quite some correlation and resulting in det(S_i) being zero. Then my discriminant function all turn Inf, which is disastrous for me.
I know there must be a lot of things go wrong here, anyone willling to help me out?
UPDATE: Anyone can help how to get the formula working?
I do not analyze the concept, as it is not very clear to me what you are trying to accomplish here, and do not know the dataset, but regarding the problem with the covariance matrix:
The most obvious solution for data, where you need a covariance matrix and its determinant, and from numerical reasons it is not feasible is to use some kind of dimensionality reduction technique in order to capture the most informative dimensions and simply discard the rest. One such method is Principal Component Analysis (PCA), which applied to your data and truncated after for example 5-20 dimensions would yield the reduced covariance matrix with non-zero determinant.
PS. It may be a good idea to post this question on Cross Validated
Probably you do not have enough data to infer parameters in a space of dimension d. Typically, the way you would get around this is to take an MAP estimate as opposed to an ML.
For the multivariate normal, this is a normal-inverse-wishart distribution. The MAP estimate adds the matrix parameter of inverse Wishart distribution to the ML covariance matrix estimate and, if chosen correctly, will get rid of the singularity problem.
If you are actually trying to create a classifier for normally distributed data, and not just doing an experiment, then a better way to do this would be with a discriminative method. The decision boundary for a multivariate normal is quadratic, so just use a quadratic kernel in conjunction with an SVM.

CvSVM.predict() gives 'NaN' output and low accuracy

I am using CvSVM to classify only two types of facial expression. I used LBP(Local Binary Pattern) based histogram to extract features from the images, and trained using cvSVM::train(data_mat,labels_mat,Mat(),Mat(),params), where,
data_mat is of size 200x3452, containing normalized(0-1) feature histogram of 200 samples in row major form, with 3452 features each(depends on number of neighbourhood points)
labels_mat is corresponding label matrix containing only two value 0 and 1.
The parameters are:
CvSVMParams params;
params.svm_type =CvSVM::C_SVC;
params.kernel_type =CvSVM::LINEAR;
params.C =0.01;
The problem is that:-
while testing I get very bad result (around 10%-30% accuracy), even after applying with different kernel and train_auto() function.
CvSVM::predict(test_data_mat,true) gives 'NaN' output
I will greatly appreciate any help with this, it's got me stumped.
I suppose, that your classes linearly hard/non-separable in feature space you use.
May be it will be better to apply PCA to your dataset before classifier training step
and estimate effective dimensionality of this problem.
Also I think it will be userful test your dataset with other classifiers.
You can adapt for this purpose standard opencv example points_classifier.cpp.
It includes a lot of different classifiers with similar interface you can play with.
The SVM generalization power is low.In the first reduce your data dimension by principal component analysis then change your SVM kerenl type to RBF.

Can one conduct training an SVM with detected false positives iteratively?

I'm working on a machine learning problem in image processing. I want to get the location of an object in an image by using Histogram of Oriented Gradients (HOG) and a support vector machine (SVM). I've read a couple of articles and tutorials about training the SVM. The setup is pretty standard. I have labeled positive training images and now need to generate a set of negative training samples.
In literature, the approach to generate negative training samples by randomly choosing a position is found very often. I've also seen some approaches where in a successive step to choosing random negative samples, the false-positives of a detection are used as negative training samples once again.
However, I'm wondering if one could not use this approach generally from the start. So one would generate only one false training sample randomly, run the detection and put false-positives in the negative training set again. This seems quite an obvious strategy to me, but I wonder if I'm missing something.
The theory behind this method is laid out in Object Detection with Discriminatively Trained Part Based Models by P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan in their PAMI paper. In essence, your starting negative set does not matter, you will always converge to the same classifier if you iteratively add hard samples (with an SVM margin > -1). Starting with a single negative would simply make this convergence slower.
To me it sounds like you want to train the SVM classifier online/incrementally, i.e. updating the classifier with new samples. Such methods are generally only used if new data comes available over time. In your case it seems that you can generate a whole set of negative training samples, so there would be no need to train it incrementally. I'm inclined to say that training the classifier in one run will be better than doing this incrementally (as hinted at by larsmans).
(Again, I'm not an image processing specialist, so take this with a grain of salt.)
I'm wondering if one could not use this approach generally from the start.
You'd need some way to detect the false positives from a classification run. To do so, you need a ground truth, that is, you need a human in the loop. In effect, you'd be doing active learning. If that's what you want to do, you could just as well start with a bunch of hand-labeled negative examples.
Alternatively, you could set this up as a PU learning problem. I have no idea whether that works well with images, but for text classification, it sometimes works.

Large Scale Image Classifier

I have a large set of plant images labeled with the botanical name. What would be the best algorithm to use to train on this dataset in order to classify an unlabel photo? The photos are processed so that 100% of the pixels contain the plant (e.g. either closeups of the leaves or bark), so there are no other objects/empty-space/background that the algorithm would have to filter out.
I've already tried generating SIFT features for all the photos and feeding these (feature,label) pairs to a LibLinear SVM, but the accuracy was a miserable 6%.
I also tried feeding this same data to a few Weka classifiers. The accuracy was a little better (25% with Logistic, 18% with IBk), but Weka's not designed for scalability (it loads everything into memory). Since the SIFT feature dataset is a several million rows, I could only test Weka with a random 3% slice, so it's probably not representative.
EDIT: Some sample images:
Normally, you would not train on the SIFT features directly. Cluster them (using k-means) and then train on the histogram of cluster membership identifiers (i.e., a k-dimensional vector, which counts, at position i, how many features were assigned to the i-th cluster).
This way, you obtain a single output per image (and a single, k-dimensional, feature vector).
Here's the quasi-code (using mahotas and milk in Pythonn):
from import surf
from milk.unsupervised.kmeans import kmeans,assign_centroids
import milk
# First load your data:
images = ...
labels = ...
local_features = [surfs(im, 6, 4, 2) for im in imgs]
allfeatures = np.concatenate(local_features)
_, centroids = kmeans(allfeatures, k=100)
histograms = []
for ls in local_features:
hist = assign_centroids(ls, centroids, histogram=True)
cmatrix, _ = milk.nfoldcrossvalidation(histograms, labels)
print "Accuracy:", (100*cmatrix.trace())/cmatrix.sum()
This is a fairly hard problem.
You can give BoW model a try.
Basically, you extract SIFT features on all the images, then use K-means to cluster the features into visual words. After that, use the BoW vector to train you classifiers.
See the Wikipedia article above and the references papers in that for more details.
You probably need better alignment, and probably not more features. There is no way you can get acceptable performance unless you have correspondences. You need to know what points in one leaf correspond to points on another leaf. This is one of the "holy grail" problems in computer vision.
People have used shape context for this problem. You should probably look at this link. This paper describes the basic system behind leafsnap.
You can implement the BoW model according to this Bag-of-Features Descriptor on SIFT Features with OpenCV. It is a very good tutorial to implement the BoW model in OpenCV.

How to approach machine learning problems with high dimensional input space?

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the results I get are not quite satisfactory?
1, 2 or 3 dimensional data can be visualized, along with the algorithm's results, so you can get the hang of what's going on, and have some idea how to aproach the problem. Once the data is over 3 dimensions, other than intuitively playing around with the parameters I am not really sure how to attack it?
What do you do to the data? My answer: nothing. SVMs are designed to handle high-dimensional data. I'm working on a research problem right now that involves supervised classification using SVMs. Along with finding sources on the Internet, I did my own experiments on the impact of dimensionality reduction prior to classification. Preprocessing the features using PCA/LDA did not significantly increase classification accuracy of the SVM.
To me, this totally makes sense from the way SVMs work. Let x be an m-dimensional feature vector. Let y = Ax where y is in R^n and x is in R^m for n < m, i.e., y is x projected onto a space of lower dimension. If the classes Y1 and Y2 are linearly separable in R^n, then the corresponding classes X1 and X2 are linearly separable in R^m. Therefore, the original subspaces should be "at least" as separable as their projections onto lower dimensions, i.e., PCA should not help, in theory.
Here is one discussion that debates the use of PCA before SVM: link
What you can do is change your SVM parameters. For example, with libsvm link, the parameters C and gamma are crucially important to classification success. The libsvm faq, particularly this entry link, contains more helpful tips. Among them:
Scale your features before classification.
Try to obtain balanced classes. If impossible, then penalize one class more than the other. See more references on SVM imbalance.
Check the SVM parameters. Try many combinations to arrive at the best one.
Use the RBF kernel first. It almost always works best (computationally speaking).
Almost forgot... before testing, cross validate!
EDIT: Let me just add this "data point." I recently did another large-scale experiment using the SVM with PCA preprocessing on four exclusive data sets. PCA did not improve the classification results for any choice of reduced dimensionality. The original data with simple diagonal scaling (for each feature, subtract mean and divide by standard deviation) performed better. I'm not making any broad conclusion -- just sharing this one experiment. Maybe on different data, PCA can help.
Some suggestions:
Project data (just for visualization) to a lower-dimensional space (using PCA or MDS or whatever makes sense for your data)
Try to understand why learning fails. Do you think it overfits? Do you think you have enough data? Is it possible there isn't enough information in your features to solve the task you are trying to solve? There are ways to answer each of these questions without visualizing the data.
Also, if you tell us what the task is and what your SVM output is, there may be more specific suggestions people could make.
You can try reducing the dimensionality of the problem by PCA or the similar technique. Beware that PCA has two important points. (1) It assumes that the data it is applied to is normally distributed and (2) the resulting data looses its natural meaning (resulting in a blackbox). If you can live with that, try it.
Another option is to try several parameter selection algorithms. Since SVM's were already mentioned here, you might try the approach of Chang and Li (Feature Ranking Using Linear SVM) in which they used linear SVM to pre-select "interesting features" and then used RBF - based SVM on the selected features. If you are familiar with Orange, a python data mining library, you will be able to code this method in less than an hour. Note that this is a greedy approach which, due to its "greediness" might fail in cases where the input variables are highly correlated. In that case, and if you cannot solve this problem with PCA (see above), you might want to go to heuristic methods, which try to select best possible combinations of predictors. The main pitfall of this kind of approaches is the high potential of overfitting. Make sure you have a bunch "virgin" data that was not seen during the entire process of model building. Test your model on that data only once, after you are sure that the model is ready. If you fail, don't use this data once more to validate another model, you will have to find a new data set. Otherwise you won't be sure that you didn't overfit once more.
List of selected papers on parameter selection:
Feature selection for high-dimensional genomic microarray data
Oh, and one more thing about SVM. SVM is a black box. You better figure out what is the mechanism that generate the data and model the mechanism and not the data. On the other hand, if this would be possible, most probably you wouldn't be here asking this question (and I wouldn't be so bitter about overfitting).
List of selected papers on parameter selection
Feature selection for high-dimensional genomic microarray data
Wrappers for feature subset selection
Parameter selection in particle swarm optimization
I worked in the laboratory that developed this Stochastic method to determine, in silico, the drug like character of molecules
I would approach the problem as follows:
What do you mean by "the results I get are not quite satisfactory"?
If the classification rate on the training data is unsatisfactory, it implies that either
You have outliers in your training data (data that is misclassified). In this case you can try algorithms such as RANSAC to deal with it.
Your model(SVM in this case) is not well suited for this problem. This can be diagnozed by trying other models (adaboost etc.) or adding more parameters to your current model.
The representation of the data is not well suited for your classification task. In this case preprocessing the data with feature selection or dimensionality reduction techniques would help
If the classification rate on the test data is unsatisfactory, it implies that your model overfits the data:
Either your model is too complex(too many parameters) and it needs to be constrained further,
Or you trained it on a training set which is too small and you need more data
Of course it may be a mixture of the above elements. These are all "blind" methods to attack the problem. In order to gain more insight into the problem you may use visualization methods by projecting the data into lower dimensions or look for models which are suited better to the problem domain as you understand it (for example if you know the data is normally distributed you can use GMMs to model the data ...)
If I'm not wrong, you are trying to see which parameters to the SVM gives you the best result. Your problem is model/curve fitting.
I worked on a similar problem couple of years ago. There are tons of libraries and algos to do the same. I used Newton-Raphson's algorithm and a variation of genetic algorithm to fit the curve.
Generate/guess/get the result you are hoping for, through real world experiment (or if you are doing simple classification, just do it yourself). Compare this with the output of your SVM. The algos I mentioned earlier reiterates this process till the result of your model(SVM in this case) somewhat matches the expected values (note that this process would take some time based your problem/data size.. it took about 2 months for me on a 140 node beowulf cluster).
If you choose to go with Newton-Raphson's, this might be a good place to start.
