I'm using an OpenCV Haar classifier in my work but I keep reading conflicting reports on whether the OpenCV Haar classifier is an SVM or not, can anyone clarify if it is using an SVM? Also if it is not using an SVM what advantages does the Haar method offer over an SVM approach?
SVM and Boosting (AdaBoost, GentleBoost, etc) are feature classification strategies/algorithms. Support Vector Machines solve a complex optimization problem, often using kernel functions which allows us to separate samples by working in a much higher dimension feature space. On the other hand, boosting is a strategy based on combining lots of "cheap" classifiers in a smart way, which leads to a very fast classification. Those weak classifiers can be even SVM.
Haar-like features are a kind of features based in integral images and very suitable for Computer Vision problems.
This is, you can combine Haar features with any of the two classification schemes.
It isn't SVM. Here is the documentation:
http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#haar-feature-based-cascade-classifier-for-object-detection
It uses boosting (supporting AdaBoost and a variety of other similar methods -- all based on boosting).
The important difference is related to speed of evaluation is important in cascade classifiers and their stage based boosting algorithms allow very fast evaluation and high accuracy (in particular support training with many negatives), at a better balance point than an SVM for this particular application.
Related
So far, I have read some highly cited metric learning papers. The general idea of such papers is to learn a mapping such that mapped data points with same label lie close to each other and far from samples of other classes. To evaluate such techniques they report the accuracy of the KNN classifier on the generated embedding. So my question is if we have a labelled dataset and we are interested in increasing the accuracy of classification task, why do not we learn a classifier on the original datapoints. I mean instead of finding a new embedding which suites KNN classifier, we can learn a classifier that fits the (not embedded) datapoints. Based on what I have read so far the classification accuracy of such classifiers is much better than metric learning approaches. Is there a study that shows metric learning+KNN performs better than fitting a (good) classifier at least on some datasets?
Metric learning models CAN BE classifiers. So I will answer the question that why do we need metric learning for classification.
Let me give you an example. When you have a dataset of millions of classes and some classes have only limited examples, let's say less than 5. If you use classifiers such as SVMs or normal CNNs, you will find it impossible to train because those classifiers (discriminative models) will totally ignore the classes of few examples.
But for the metric learning models, it is not a problem since they are based on generative models.
By the way, the large number of classes is a challenge for discriminative models itself.
The real-life challenge inspires us to explore more better models.
As #Tengerye mentioned, you can use models trained using metric learning for classification. KNN is the simplest approach but you can take the embeddings of your data and train another classifier, be it KNN, SVM, Neural Network, etc. The use of metric learning, in this case, would be to change the original input space to another one which would be easier for a classifier to handle.
Apart from discriminative models being hard to train when data is unbalanced, or even worse, have very few examples per class, they cannot be easily extended for new classes.
Take for example facial recognition, if facial recognition models are trained as classification models, these models would only work for the faces it has seen and wouldn't work for any new face. Of course, you could add images for the faces you wish to add and retrain the model or fine-tune the model if possible, but this is highly impractical. On the other hand, facial recognition models trained using metric learning can generate embeddings for new faces, which can be easily added to the KNN and your system then can identify the new person given his/her image.
Is this process correct?
Suppose We have a bunch of data such as MNIST.
We just feed all these data(without label) to RBM and resample each data from trained model.
Then output can be treated as new data for classification.
Do I understand it correctly?
What is the purpose of using RBM?
You are correct, RBMs are a form of unsupervised learning algorithm that are commonly used to reduce the dimensionality of your feature space. Another common approach is to use autoencoders.
RBMs are trained using the contrastive divergence algorithm. The best overview of this algorithm comes from Geoffrey Hinton who came up with it.
https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
A great paper about how unsupervised learning improves performance can be found at http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf. The paper shows that unsupervised learning provides better generalization and filters (if using CRBMs)
I'm using Weka to perform classification, clustering, and some regression on a few large data sets. I'm currently trying out all the classifiers (decision tree, SVM, naive bayes, etc.).
Is there a way (in Weka or other machine learning toolkit) to sweep through all the available classifier algorithms to find the one that produces the best cross-validated accuracy or other metric?
I'd like to find the best clustering algorithm, too, for my other clustering problem; perhaps finding the lowest sum-of-squared-error?
Isn't that some kind of overfitting, too? Trying tons of classifiers, and choosing the best?
Also note that preprocessing is usually very important, and different classifiers may need different preprocessing; and each classifier has in turn a dozen or so parameters...
Same for clustering, don't choose a clustering algorithm by some metric. Because if you choose e.g. "lowest sum-of-squares", k-means will win. Not because it is better. But because it is more overfit to your evaluation method: k-means optimizes the sum-of-squares. The results may be crap on other metrics, but on SSQ, they are by design a local optimum.
Data mining is not something you can automate to a push-button level.
It's a skill that requires experience on how to preprocess, choose algorithms, adjust parameters and evaluate the actual outcome. Otherwise, you'd have some software on the market where you just feed your data and get the optimal classifier out.
I've built an algorithm for pedestrian detection using openCV tools. To perform classification I use a boosted classifier trained with the CvBoost class.
The problem of this implementation is that I need to feed my classifier the whole set of features I used for training. This makes the algorithm extremely slow, so much that each image takes around 20 seconds to be fully analysed.
I need a different detection structure, and openCV has this Soft Cascade class that seems like exactly what I need. Its basic principle is that there is no need to examine all the features of a testing sample, since a detector can reject most negative samples using a small number of features. The problem is that I have no idea how to train one given a fully labeled set of negative and positive examples.
I find no information about this online, so I am looking for any tips you can give me on how to use this soft cascade to make classification.
Best regards
I am trying to train an adaboost classifier using the openCV library, for visual pedestrian detection.
I've come across the notion that adaboost allows the selection of the most relevant features, meaning, if I harvest 50.000 features from images and then use them to train a classifier, in the end of the training process I would be able to select, for example, the best 2000 out of those 50.000.
Then, this would allow me to harvest only those 2000 during the actual process for the sake of speed.
Is this even true? Or am I falling in a misconception?
If true,, is it possible to be done using the openCV library?
Best regards
Yes, this is true. That's exactly what boosting is all about.
Please, check the OpenCV documentation about training a cascade of boosted classifiers.