Why no point-of-interest detectors in Object Detection/Tracking? - opencv

I aim to detect and track vehicles with opencv using recorded on-road scenarios in real-time. I realize most of the algorithms/papers use Haar-like-features or HoG as feature-descriptors, and in most literature about vehicle detection there is not much investigation in using point-of-interest based approaches.
I mean in OpenCV there are so many nice edge/corner -based detectors like FAST, ORB, BRISK, ....why don't use them in junction with a nice descriptor and do some matching/classification afterwards?
And when is it actually better to use such a detector/descriptor strategy for object detection compared with a traditional Haar-cascade or HoG/SVM approach? Is it for robustness or performance reasons?
Regards
Obi

Related

More complex hand-pose estimation algorithms

I am currently looking into hand-pose estimation in Unity without using any expensive plugins! At the moment, I have implemented a simple hand-tracking system by extracting the contours of the hand, like the link below:
https://www.youtube.com/watch?v=4QE5FcUK5ZA
However, it doesn't work brilliantly in all environments and tends to not recognise the hand when other object are in the frame (like a face!). Does anyone have any more complex algorithms for hand-pose estimation? I've looked at using neural nets but they tend to use a lot of CPU and/or GPU power, and I need this to be lightweight and not lag in Unity.
Anyone have any suggestions?
Multi-layered random forest is a good light weight method for real time hand pose estimation https://ieeexplore.ieee.org/document/7789644/.
It uses an ensemble of regressors that are specialised on different areas of angle space. And the first layer learns how to weight the output of each of these specialised regressors.
It achieves state of the art on hand pose estimation and has been used by the author in real-time AR applications.
The model uses contour features like the ones you have extracted.

methods of face detection?

i want to know the best method of face detection because i'm working on predict face emotion application
so Before analyzing the facial expression of a face fixed or moving, it should detect or follow to extract relevant information. several
detection methods existe but what is the best in my case ?
A fast and easy way to get started with face detection is through using OpenCV's Haar detection methods (a slightly modified version of the viola-jones face detection algorithm IIRC). They have pre-trained haar cascade classifiers for entire faces and individual face components, e.g. eyes, nose, etc. You can also train your own if you feel so inclined. Haar features also have the advantage of being very fast, so it's quite usable with video (which it sounds like you'll be using). Also, by having the individual face-components classified, it may simplify your emotion detection/prediction algorithm.
You can find the OpenCV documentation detailing Haar feature-based object recognition at http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#viola01
and an example of performing face detection at http://code.opencv.org/projects/opencv/repository/revisions/master/entry/samples/cpp/dbt_face_detection.cpp
As for the emotion detection, that's an open research question, so anything you try will likely be fairly involved. If you're into that sort of thing, some good papers to look over might be http://www.utdallas.edu/dept/eecs/research/researchlabs/msp-lab/publications/Busso_2004.pdf and http://humansensing.cs.cmu.edu/papers/Automated.pdf

How does HOG feature descriptor training work?

There doesn't seem to be any implementations of HOG training in openCV and little sources about how HOG training works. From what I gathered, HOG training can be done in real time. But what are the requirements of training? How does the training process actually work?
As with most computer vision algorithms, Google Scholar is your friend :) I would suggest reading a few papers on how it works. Here is one of the most referenced papers on HoG for you to start with.
Another tip when researching in computer vision is to note the authors of the papers you find interesting, and try to find their websites. They will tend to have an implementation of their algorithms as well as rules of thumb on how to use them. Also, look up the references that are sited in the paper about your algorithm. This can be very helpful in aquiring the background knowledge to truly understand how the algorithm works and why.
Your terminology is a bit mixed up. HOG is a feature descriptor. You can train a classifier using HOG, which can in turn be used for object detection. OpenCV includes a people detector that uses HOG features and an SVM classifier. It also includes CascadeClassifier, which can use HOG, and which is typically used for face detection.
There is a program in OpenCV called opencv_traincascade, which lets you train a cascade object detector, an which gives you the option to use HOG. There is a function in the Computer Vision System Toolbox for MATLAB called trainCascadeObjectDetector, which does the same thing.

Is Haar Cascade the only available technique for image recognition in OpenCV

I know that there are many detection techniques in OpenCV, such as SURF, STAR, ORB etc...but those techniques are for feature detection of new video feed, not for dealing with specific instances of objects that require prior learning. OpenCV's documentation isn't quite as easy to flip through and I've yet been able to find anything besides Haar, which I know deals best with face recognition.
So are there any other techniques besides Haar? The Haar technique dates back to research 10 years ago, so ideally I hope that there have been some more advances since then that have been implemented in OpenCV.
If you are looking for OpenCV machine learning type algorithms, check out this link.
For a state of the art on-the-fly object detection algorithm, have a look at OpenTLD. It uses bounding boxes and random forests to learn about an object over time. Check out the demo video here.
Also check out the matching_to_many_images.cpp sample from OpenCV. It uses feature descriptors to match objects much like Google Goggles works. A related example to this is the bagofwords_classification.cpp sample. It may be what you are looking for in this case. It uses feature detectors (SURF, SIFT, etc...) to detect objects and then classify them by comparing the relative positions of the features to a learned database of features. Have a look also at this tutorial from MIT.
The latentsvmdetect.cpp may also be a good starting point for you.
Hope that helps!

Difference between feature detection and object detection

I know that most common object detection involves Haar cascades and that there are many techniques for feature detection such as SIFT, SURF, STAR, ORB, etc... but if my end goal is to recognizes objects doesn't both ways end up giving me the same result? I understand using feature techniques on simple shapes and patterns but for complex objects these feature algorithms seem to work as well.
I don't need to know the difference in how they function but whether or not having one of them is enough to exclude the other. If I use Haar cascading, do I need to bother with SIFT? Why bother?
thanks
EDIT: for my purposes I want to implement object recognition on a broad class of things. Meaning that any cups that are similarly shaped as cups will be picked up as part of class cups. But I also want to specify instances, meaning a NYC cup will be picked up as an instance NYC cup.
Object detection usually consists of two steps: feature detection and classification.
In the feature detection step, the relevant features of the object to be detected are gathered.
These features are input to the second step, classification. (Even Haar cascading can be used
for feature detection, to my knowledge.) Classification involves algorithms
such as neural networks, K-nearest neighbor, and so on. The goal of classification is to find
out whether the detected features correspond to features that the object to be detected
would have. Classification generally belongs to the realm of machine learning.
Face detection, for example, is an example of object detection.
EDIT (Jul. 9, 2018):
With the advent of deep learning, neural networks with multiple hidden layers have come into wide use, making it relatively easy to see the difference between feature detection and object detection. A deep learning neural network consists of two or more hidden layers, each of which is specialized for a specific part of the task at hand. For neural networks that detect objects from an image, the earlier layers arrange low-level features into a many-dimensional space (feature detection), and the later layers classify objects according to where those features are found in that many-dimensional space (object detection). A nice introduction to neural networks of this kind is found in the Wolfram Blog article "Launching the Wolfram Neural Net Repository".
Normally objects are collections of features. A feature tends to be a very low-level primitive thing. An object implies moving the understanding of the scene to the next level up.
A feature might be something like a corner, an edge etc. whereas an object might be something like a book, a box, a desk. These objects are all composed of multiple features, some of which may be visible in any given scene.
Invariance, speed, storage; few reasons, I can think on top of my head. The other method to do would be to keep the complete image and then check whether the given image is similar to glass images you have in your database. But if you have a compressed representation of the glass, it will need lesser computation (thus faster), will need lesser storage and the features tells you the invariance across images.
Both the methods you mentioned are essentially the same with slight differences. In case of Haar, you detect the Haar features then you boost them to increase the confidence. Boosting is nothing but a meta-classifier, which smartly chooses which all Harr features to be included in your final meta-classification, so that it can give a better estimate. The other method, also more or less does this, except that you have more "sophisticated" features. The main difference is that, you don't use boosting directly. You tend to use some sort of classification or clustering, like MoG (Mixture of Gaussian) or K-Mean or some other heuristic to cluster your data. Your clustering largely depends on your features and application.
What will work in your case : that is a tough question. If I were you, I would play around with Haar and if it doesn't work, would try the other method (obs :>). Be aware that you might want to segment the image and give some sort of a boundary around for it to detect glasses.

Resources