I'm trying to train CascadeClassifier from OpenCV to detect a simple high-contrast company logo, but it doesn't work. What it detects looks like just random image patches. It doesn't even work on the original sample. I'm using opencv_createsamples to create a set of positives on a plain white background from a single original logo image.
At the same time I was able to successfully train a cascade for detecting stamps using many samples from real documents. This looks strange to me, because a stamp is much more complex than company logo.
What can I be doing wrong? Can LBP or Haar features be used do describe a simple object such as logo?
It depending on the type of company logo and the accuraty level. LBP is very fast in training data, but less accurate than Haar classifier. Haar classiefier can take a week to learn the recognition, but is very accurate. To have a good classifier you need to have a lot of data. I don't know what data you have and how mutch. So I see that the question is asked long time ago...
Related
Please can anyone help with where I can get Haar training files that use small samples such as 5? I have downloaded a couple but one is giving me error messages while the second require 1000 samples.
Thank you very much
Little samples is not how it is designed to work with. Almost all the algorithms that lets us do classification need a large amount of training samples.
It depends on what you want to detect. If you want to detect a logo and you have a clean image of a logo, you can create many training samples out of it by adding noise, changing contrast and brightness, rotating, distorting, etc. OpenCV's Haar training module supports this, so it won't be hard.
This is called data augmentation. But if you want to detect faces, data augmentation alone won't be enough.
Creating a rule-based system by observing the few samples that you have works best for this situation, if what you want to detect is a natural object.
I can add additional links to this answer, pointing to sample code, if you can provide more details.
I'm trying to use the hog detector in openCV, to detect 3 types of object from a video feed through a fish eye. The types are:
People
Books (when held by some person)
Chairs
The snapshot of the video I have looks like this image from this website - :
I setup the hog classifier using the default people detector and tried do first detect the people. I noticed when the people were of the size that you would expect from a non-fish eye lens (something you would get with a standard 35mm lens), they would get detected. If not the people would not get detected. This seemed logical as the classifier would expect people to be a standard size.
I was wondering how I could modify the classifier to detect people thorough a fish eye lens. The options I see are these:
Undistort the fish eye effect and run the classifier - I do not like to do this, because currently, I'm not in a position to calibrate the camera and get the distortion coefficients
Distort people images from a people image data set to around the distortion I would get through my video and re-train the classifier - I think this would work, but would like to understand would this not work as I think it work.
My question is:
What would be a valid approach for this problem? Will #2 of my options work for all 3 types of objects (people, books and chairs).
What is good classifier that can be trained to identify the 3 types of objects (cascade or hog or anything else - please suggest a library as well)? Will my #2 method of distorting and training with positive and negative examples be a good solution?
Retraining the HOG cascade to the performance level of the cascade included with OpenCV would be a pretty involved process. You would also have to simulate the distortion of your specific lens to modify the training data.
For the quickest solution I would recommend your first option of distorting the image. If you are willing to put in the time and resources to retrain the classifier (which you may have to do depending on how you are detecting chairs and books) then there are some publicly available pedestrian datasets that will be useful.
1) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
2) http://pascal.inrialpes.fr/data/human/
Its unlikely that you'll be able to find a chair cascade due to the variability in chair design. I would recommend you train your own cascade on the specific chairs you intend to detect. I don't know of any existing cascade for books and a quick google search didn't yield any promising results. A good resource for data if you intend on training your own cascade for books is ImageNet.
In my application I have to track the lecturer in a university lecture using a static camera. At the moment I'm using the default GPUHOGDescriptor from Emgu CV which works good if the whole body of the lecturer is visible. In the case where the lecturer is standing behind the desk, the detection works only around 20% of the time. My idea was to use a HOG detector which uses only the upper half of the body. I couldn't find any detector in the Internet but I'm sure that I'm not the first one with this problem. Or is there a fundamental problem that upper body detection does not work?
Can someone help me find one or share their descriptor? When I would want to train a HOG descriptor for myself, would it work to use a standard dataset like the INRIA and change only the size such that it takes only the upper half of the images?
It would work by changing the size to 100*90 and by training the svm detector with a set of images of upper half portion . This would enhance accuracy but would cost you performance.
You need to get the HOG features from all of your positive and negative sample images using HOGDescriptor:compute functions, then feed the result to SVM library such as SVMlight. This page will help you to compute the feature and get the resulting model from SVM Light. The model will be available in genfiles/descriptorVector
I'm doing my project which need to detect/classify some simple sign language.
I'm new to opencv, I have try to use contours,hull but it seem very hard to apply...
I googled and find the method call "Haarcascade" which seem to be about taking pictures and create .xml file.
So, I decide to do Haarcascade......
Here are some example of the sign language that I want to detect/classify
Set1 : http://www.uppic.org/image-B600_533D7A09.jpg
Set2 : http://www.uppic.org/image-0161_533D7A09.jpg
The result I want here is to classify these 2 set.
Any suggestion if I could use haarcascade method with this
*I'm using xcode with my webcam, but soon I'm gonna port them onto iOS device. Is it possible?
First of all: I would not use haar features for learning on whole images.
Let's see how haar features look like:
Let me point out how learning works. We're building a classifier that consists of many 'weak' classifiers. In approximation, every 'weak' classifier is built in such way to find out information about several haar features. To simplify, let's peek one of them to consideration, a first one from edge features. During learning in some way, we compute a threshold value by sliding this feature over the whole input training image, using feature as a mask: we sum pixels 'under' the white part of the feature, sum pixels 'under' black part and subtract one value from other. In our case, threshold value will give an information if vertical edge feature exists on the training image. After training of weak classifier, you repeat process with different haar features. Every weak classifier gives information about different features.
What is important: I summarized how training works to describe what kind of objects are good to be trained in such way. Let's pick the most powerful application - detecting human's face. There's an important feature of face:
It has a landmarks which are constrastive (they differ from background - skin)
The landmark's locations are correlated to each other in every face (e.g. distance between them in approximation is some factor of face size)
That makes haar features powerful in that case. As you can see, one can easily point out haar features which are useful for face detection e.g. first and second of line features are good for detection a nose.
Back to your problem, ask yourself if your problem have features 1. and 2. In case of whole image, there is too much unnecessary data - background, folds on person's shirt and we don't want to noise classifier with it.
Secondly, I would not use haar features from some cropped regions.
I think the difference between palms is too less for haar classifier. You can derive that from above description. The palms are not different so much - the computed threshold levels will be too similar. The most significant features for haar on given palms will be 'edges' between fingers and palm edges. You can;t rely on palm's edges - it depends from the background (walls, clothes etc.) And edges between fingers are carrying too less information. I am claiming that because I have an experience with learning haar classifier for palm. It started to work only if we cropped palm region containing fingers.
I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.