I've built an algorithm for pedestrian detection using openCV tools. To perform classification I use a boosted classifier trained with the CvBoost class.
The problem of this implementation is that I need to feed my classifier the whole set of features I used for training. This makes the algorithm extremely slow, so much that each image takes around 20 seconds to be fully analysed.
I need a different detection structure, and openCV has this Soft Cascade class that seems like exactly what I need. Its basic principle is that there is no need to examine all the features of a testing sample, since a detector can reject most negative samples using a small number of features. The problem is that I have no idea how to train one given a fully labeled set of negative and positive examples.
I find no information about this online, so I am looking for any tips you can give me on how to use this soft cascade to make classification.
Best regards
Related
Building a classifier for classical problems, like image classification, is quite straightforward, since by visualization on the image we know the pixel values do contain the information about the target.
However, for the problems in which there is no obvious visualizable pattern, how should we evaluate or to see if the features collected are good enough for the target information? Or if there are some criterion by which we can conclude the collected features does not work at all. Otherwise, we have to try different algorithms or classifiers to verify the predictability of the collected data. Or if there is a thumb rule saying that if apply classical classifiers, like SVM, random forest and adaboost, we cannot get a classifier with a reasonable accuracy (70%) then we should give up and try to find some other more related features.
Or by some high dim visualization tool, like t-sne, if there is no clear pattern presented in some low dim latent space, then we should give up.
First of all, there might be NO features that explain the data well enough. The data may simply be pure noise without any signal. Therefore speaking about "reasonable accuracy" of any level e.g. 70% is improper. For some data sets a model that explains 40 % of its variance will be fantastic.
Having said that, the simplest practical way to evaluate the input features is to calculate correlations between each of them and the target.
Models have their own ways of evaluating features importance.
I started learning Image Recognition a few days back and I would like to do a project in which it need to identify different brand logos in Android.
For Ex: If I take a picture of Nike logo in an Android device then it needs to display "Nike".
Low computational time would be the main criteria for me.
For this, I have done some work and started learning OpenCV sample examples.
What would be the best Image Recognition that would be used for me.
1) I came to know from Template Matching that their applicability is limited mostly by the available computational power, as identification of big and complex templates can be time consuming. (and so I don't want to use it)
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge this would be a better option for me)
3) How about Deep Learning and Pattern recognition concepts? (I was digging on this and don't know whether it would be an option for me). Can any of you let me know whether I can use this and why it would be an better choice for me when compared with 1 and 2.
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
I’m now running one of my pet projects and it's required face recognition – detecting the area with face on the photo, if it exists with Raspberry pi, so I’ve done some analysis about that task
And I found this approach. The key idea is in avoiding scanning entire picture to help by scanning windows of different sizes like it was in OpenCV, but by dividing an entire photo into 49 (7x7) squares and train the model not only for detecting of presenting one of classes inside each square, but also for determining the location and size of detecting object
It’s only 49 runs of trained model, so I think it's possible to execute this in less than in a second even on non state-of-the-art smartphones. Anyway, it will be a trade-off between accuracy and performance
About the model
I will use vgg –like model, probably a bit simpler than even vgg11A.
In my case ready dataset already exists. So I can train convolutional network with it
Why deep learning approach is better than 1-3 you mentioned? Because of its higher accuracy for such kind of tasks. It’s practically proven. You could check it in kaggle. Majority of the top models for classification competitions are based on convolutional networks
The only disadvantage for you – probably it would be necessary create your own dataset to train the model
Here is a post that I think can be useful for you: Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition. Another one: Logo recognition in images.
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge
this would be a better option for me)
Just remember that SIFT and SURF are both patented so you will need a license for any commercial use (free for non-commercial use).
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
It works (if I understand your question right), much of this depends of how you trained your classifier. You could train it to detect all kind of rotations and scales. Anyways, I would discourage you to go for this option as I think the other possible solutions are better meant for the case.
I have a set of data, which has 3 possible events. There are 24 features that effect which of the three events will happen.
I have training data with all the 24 features and which events happened.
What I want to do is using this data predict which of the three events will happen next, given all the 24 feature values are known.
Could you suggest some machine learning algorithm that I should use to solve this problem
This sounds like a typical classification problem in supervised learning. However, you haven't given us enough information to suggest a particular algorithm.
We would need statistics about the "shape" of your data: relative clustering and range, correlations among the features, etc. The critical points so far are that you have few classes (3) and many more features than classes. What have you considered so far? Backing up a little, what unsupervised classification algorithms have you researched well enough to use?
My personal approach is to hit such a generic problem with Naive Bayes or multi-class SVM, and use the resulting classification parameters as input for feature reduction. I might also try a CNN with one hidden layer (or none, just a single FC connection) and then examine the weights to eliminate extraneous features.
Given the large dimensionality, you might also try hitting it with k-means clustering to see whether the classification is already cohesive in 24-D space. Try k=6; in most runs, this will give you 3 good clusters and 3 tiny outliers.
Does that get you moving toward a solution?
I am trying to train an adaboost classifier using the openCV library, for visual pedestrian detection.
I've come across the notion that adaboost allows the selection of the most relevant features, meaning, if I harvest 50.000 features from images and then use them to train a classifier, in the end of the training process I would be able to select, for example, the best 2000 out of those 50.000.
Then, this would allow me to harvest only those 2000 during the actual process for the sake of speed.
Is this even true? Or am I falling in a misconception?
If true,, is it possible to be done using the openCV library?
Best regards
Yes, this is true. That's exactly what boosting is all about.
Please, check the OpenCV documentation about training a cascade of boosted classifiers.
I know that most common object detection involves Haar cascades and that there are many techniques for feature detection such as SIFT, SURF, STAR, ORB, etc... but if my end goal is to recognizes objects doesn't both ways end up giving me the same result? I understand using feature techniques on simple shapes and patterns but for complex objects these feature algorithms seem to work as well.
I don't need to know the difference in how they function but whether or not having one of them is enough to exclude the other. If I use Haar cascading, do I need to bother with SIFT? Why bother?
thanks
EDIT: for my purposes I want to implement object recognition on a broad class of things. Meaning that any cups that are similarly shaped as cups will be picked up as part of class cups. But I also want to specify instances, meaning a NYC cup will be picked up as an instance NYC cup.
Object detection usually consists of two steps: feature detection and classification.
In the feature detection step, the relevant features of the object to be detected are gathered.
These features are input to the second step, classification. (Even Haar cascading can be used
for feature detection, to my knowledge.) Classification involves algorithms
such as neural networks, K-nearest neighbor, and so on. The goal of classification is to find
out whether the detected features correspond to features that the object to be detected
would have. Classification generally belongs to the realm of machine learning.
Face detection, for example, is an example of object detection.
EDIT (Jul. 9, 2018):
With the advent of deep learning, neural networks with multiple hidden layers have come into wide use, making it relatively easy to see the difference between feature detection and object detection. A deep learning neural network consists of two or more hidden layers, each of which is specialized for a specific part of the task at hand. For neural networks that detect objects from an image, the earlier layers arrange low-level features into a many-dimensional space (feature detection), and the later layers classify objects according to where those features are found in that many-dimensional space (object detection). A nice introduction to neural networks of this kind is found in the Wolfram Blog article "Launching the Wolfram Neural Net Repository".
Normally objects are collections of features. A feature tends to be a very low-level primitive thing. An object implies moving the understanding of the scene to the next level up.
A feature might be something like a corner, an edge etc. whereas an object might be something like a book, a box, a desk. These objects are all composed of multiple features, some of which may be visible in any given scene.
Invariance, speed, storage; few reasons, I can think on top of my head. The other method to do would be to keep the complete image and then check whether the given image is similar to glass images you have in your database. But if you have a compressed representation of the glass, it will need lesser computation (thus faster), will need lesser storage and the features tells you the invariance across images.
Both the methods you mentioned are essentially the same with slight differences. In case of Haar, you detect the Haar features then you boost them to increase the confidence. Boosting is nothing but a meta-classifier, which smartly chooses which all Harr features to be included in your final meta-classification, so that it can give a better estimate. The other method, also more or less does this, except that you have more "sophisticated" features. The main difference is that, you don't use boosting directly. You tend to use some sort of classification or clustering, like MoG (Mixture of Gaussian) or K-Mean or some other heuristic to cluster your data. Your clustering largely depends on your features and application.
What will work in your case : that is a tough question. If I were you, I would play around with Haar and if it doesn't work, would try the other method (obs :>). Be aware that you might want to segment the image and give some sort of a boundary around for it to detect glasses.