I'm doing my project which need to detect/classify some simple sign language.
I'm new to opencv, I have try to use contours,hull but it seem very hard to apply...
I googled and find the method call "Haarcascade" which seem to be about taking pictures and create .xml file.
So, I decide to do Haarcascade......
Here are some example of the sign language that I want to detect/classify
Set1 : http://www.uppic.org/image-B600_533D7A09.jpg
Set2 : http://www.uppic.org/image-0161_533D7A09.jpg
The result I want here is to classify these 2 set.
Any suggestion if I could use haarcascade method with this
*I'm using xcode with my webcam, but soon I'm gonna port them onto iOS device. Is it possible?
First of all: I would not use haar features for learning on whole images.
Let's see how haar features look like:
Let me point out how learning works. We're building a classifier that consists of many 'weak' classifiers. In approximation, every 'weak' classifier is built in such way to find out information about several haar features. To simplify, let's peek one of them to consideration, a first one from edge features. During learning in some way, we compute a threshold value by sliding this feature over the whole input training image, using feature as a mask: we sum pixels 'under' the white part of the feature, sum pixels 'under' black part and subtract one value from other. In our case, threshold value will give an information if vertical edge feature exists on the training image. After training of weak classifier, you repeat process with different haar features. Every weak classifier gives information about different features.
What is important: I summarized how training works to describe what kind of objects are good to be trained in such way. Let's pick the most powerful application - detecting human's face. There's an important feature of face:
It has a landmarks which are constrastive (they differ from background - skin)
The landmark's locations are correlated to each other in every face (e.g. distance between them in approximation is some factor of face size)
That makes haar features powerful in that case. As you can see, one can easily point out haar features which are useful for face detection e.g. first and second of line features are good for detection a nose.
Back to your problem, ask yourself if your problem have features 1. and 2. In case of whole image, there is too much unnecessary data - background, folds on person's shirt and we don't want to noise classifier with it.
Secondly, I would not use haar features from some cropped regions.
I think the difference between palms is too less for haar classifier. You can derive that from above description. The palms are not different so much - the computed threshold levels will be too similar. The most significant features for haar on given palms will be 'edges' between fingers and palm edges. You can;t rely on palm's edges - it depends from the background (walls, clothes etc.) And edges between fingers are carrying too less information. I am claiming that because I have an experience with learning haar classifier for palm. It started to work only if we cropped palm region containing fingers.
Related
My xmas holiday project this year was to build a little Android app, which should be able to detect arbitrary Euro coins in a picture, recognize their value and sum the values up.
My assumptions/requirements for the picture for a good recognition are
uniform background
picture should be roughly the size of a DinA4 paper
coins may not overlap, but may touch each other
number-side of the coins must be up/visible
My initial thought was, that for the coin value-recognition later it would be best to first detect the actual coins/their regions in the picture. Any recognition then would run on only these regions of the picture, where actual coins are found.
So the first step was to find circles. This i have accomplished using this OpenCV 3 pipeline, as suggested in several books and SO postings:
convert to gray
CannyEdge detection
Gauss blurring
HoughCircle detection
filtering out inner/redundant circles
The detection works rather successfully IMHO, here a picture of the result:
Coins detected with HoughCircles with blue border
Up to the recognition now for every found coin!
I searched for solutions to this problem and came up with
template matching
feature detection
machine learning
The template matching seems very inappropriate for this problem, as the coins can be arbitrary rotated with respect to a template coin (and the template matching algorithm is not rotation-invariant! so i would have to rotate the coins!).
Also pixels of the template coin will never exactly match those of the region of the formerly detected coin. So any algorithm computing the similarity will produce only poor results, i think.
Then i looked into feature detection. This seemed more appropriate to me. I detected the features of a template-coin and the candidate-coin picture and drew the matches (combination of ORB and BRUTEFORCE_HAMMING). Unfortunately the features of the template-coin were also detected in the wrong candidate coins.
See the following picture, where the template or "feature" coin is on the left, a 20 Cents coin. To the right there are the candidate coin, where the left-most coin is a 20 Cents coin. I actually expected this coin to have the most matches, unfortunately not. So again, this seems not to be a viable way to recognize the value of coins.
Feature-matches drawn between a template coin and candidate coins
So machine learning is the third possible solution. From university i still now about neural networks, how they work, etc. Unfortunately my practical knowledge is rather poor AND i don't know Support Vector Machines (SVM) at all, which is the machine learning supported by OpenCV.
So my question is actually not source-code related, but more how to setup the learning process.
Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
How much positives and negatives per coin should be given?
Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Of course, if you have a different possible solution, this is also welcome!
Any help is appreciated! :-)
Bye & Thanks!
Answering your questions:
1- Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
For many object classification tasks it's better to extract the features first and then train a classifier using a learning algorithm. (e.g the features can be HOG and the learning algorithm can be something like SVM or Adaboost). It's mainly due to the fact that the features have more meaningful information compared to the pixel values. (They can describe edges,shapes, texture, etc.) However, the algorithms like deep learning will extract the useful features as a part of learning procedure.
2 - How much positives and negatives per coin should be given?
You need to answer this question depending on the variation in the classes you want to recognize and the learning algorithm you use. For SVM , if you use HOG features and want to recognize specific numbers on coins you won't need much.
3- Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
Again it depends on your final decision about the features(not SVM which is the learning algorithm) you're going to choose. HOG features are not rotation invariant but there are features like SIFT or SURF which are.
4-One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Again, choose your algorithm , some of them ask you for a fixed/similar width/height ratio. You can find out about the specific requirements in related papers.
If you decide to use SVM take a look at this and also if you feel ok with Neural Network, using Tensorflow is a good idea.
I'm trying to perform image recognition of large birds, and since the camera is moving, my usual tactic of background removal and shape recognition won't be effective. Since all adult birds of the species are remarkably similar in coloration, I was thinking that Haar classifiers may be effective, and found some resources to try and train my own classifier.
Negative images should be significantly larger than the positive images, and present similar environments but not contain the positive image. However, I haven't been able to find too many details on what makes for a good positive training image. I found some references to trying to keep a similar aspect ratio in all positive training images, but for something like a bird that can change dramatically in different poses (wings open, wings closed), is it crucial? Is it better to train multiple classifiers for each pose and orientation of the target to be recognized? Is it better to instead try to identify a subset of the object that is relatively consistent (like the bird's very distinctive black and white head)?
If I use a sub-feature like the head, does it matter if it is "mirrored" in some views? Should I artificially make my positive images face the same direction, and when running the classifier on an image, run it twice, once mirrored? What are the considerations made when designing the positive image set for a classifier?
At the end of the introduction to this instructive kaggle competition, they state that the methods used in "Viola and Jones' seminal paper works quite well". However, that paper describes a system for binary facial recognition, and the problem being addressed is the classification of keypoints, not entire images. I am having a hard time figuring out how, exactly, I would go about adjusting the Viola/Jones system for keypoint recognition.
I assume I should train a separate classifier for each keypoint, and some ideas I have are:
iterate over sub-images of a fixed size and classify each one, where an image with a keypoint as center pixel is a positive example. In this case I'm not sure what I would do with pixels close to the edge of the image.
instead of training binary classifiers, train classifiers with l*w possible classes (one for each pixel). The big problem with this is that I suspect it will be prohibitively slow, as every weak classifier suddenly has to do l*w*original operations
the third idea I have isn't totally hashed out in my mind, but since the keypoints are each parts of a greater part of a face (left, right center of an eye, for example), maybe I could try to classify sub-images as just an eye, and then use the left, right, and center pixels (centered in the y coordinate) of the best-fit subimage for each face-part
Is there any merit to these ideas, and are there methods I haven't thought of?
however, that paper describes a system for binary facial recognition
No, read the paper carefully. What they describe is not face specific, face detection was the motivating problem. The Viola Jones paper introduced a new strategy for binary object recognition.
You could train a Viola Jones style Cascade for eyes, another for a nose, and one for each keypoint you are interested in.
Then, when you run the code - you should (hopefully) get 2 eyes, 1 nose, etc, for each face.
Provided you get the number of items you expected, you can then say "here are the key points!" What takes more work is getting enough data to build a good detector for each thing you want to detect, and gracefully handling false positives / negatives.
I ended up working on this problem extensively. I used "deep learning," aka several layers of neural networks. I used convolutional networks. You can learn more about them by checking out these demos:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
http://deeplearning.net/tutorial/lenet.html#lenet
I made the following changes to a typical convolutional network:
I did not do any down-sampling, as any loss of precision directly translates to a decrease in the model's score
I did n-way binary classification, with each pixel being classified as a keypoint or non-keypoint (#2 in the things I listed in my original post). As I suspected, computational complexity was the primary barrier here. I tried to use my GPU to overcome these issues, but the number of parameters in the neural network were too large to fit in GPU memory, so I ended up using an xl amazon instance for training.
Here's a github repo with some of the work I did:
https://github.com/cowpig/deep_keypoints
Anyway, given that deep learning has blown up in popularity, there are surely people who have done this much better than I did, and published papers about it. Here's a write-up that looks pretty good:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
I've read a lot about Viola Jones method but i still not understand about "Weak Classifier", "Strong Classifier", "Sub Window" in Rectangle features, what is definition about them. And what about "threshold"? How i can know the threshold value?
Can anyone help me? Thanks Before
Aim of Viola-Jones algorithm: Detection of faces in an image. This algorithm uses frontal upright faces, thus in order to be detected, the entire face must point towards the camera and should not be tilted to either side. Algorithm isface image partition based on physical estimation of position of eyes, noseand mouth on face.
Stages of the algorithm: This algorithm works in following four stages:
1. Haar features
2. Integral image
3. AdaBoost
4. Cascading
All these stages are discussed below. Before that i will answer a simple question that **why haar** ?
Haar wavelets are preferred because it is better than fourier for feature extraction.
Now, we will discuss about the stages involved in this algorithm.
Haar features: Over the given input image, a 24 x 24 base window will slide while passing haar as an argument and computation will take place usingconvolution theorem.What are different haar features, you can study aboutthem here
The output of this phase will be the detection of light and dark parts of an image.
Integral Image:The haar features extracted in above phase will be verylarge which will make computation very complex. To make this computationsimple and short, these extracted haar features are passed to integral image.It calculates the pixel values using simple mathematics. You can learn thiscalculation in the link provided above.
AdaBoost: As there will be so many features, all of them will not includeface in it. From, integral image we will get two possible things: featurescontaining face and features containing no face. We need only those featureswhich contains face. This job will be done by Adaboost. It will help to sampleface from rest of the body parts using weak classifiers and cascade. Theoverall process used is ensemble method. A weighted arrangement of all thesefeatures are used in evaluating and deciding any given window has face or not.It will eliminate all redundant features
Cascading: Weak classifiers will be cascaded to make a one strong singleclassifier while window sliding over the whole image.This process is alsoknown as boosting up the weak classifiers. A sub-window classified as a faceis passed on to the next stage in the cascade.It follows that the additionalstages a given sub-window passes, the higher chances that the sub-windowreally contains a face.
Next what: This model will be tested on real images and faces will bedetected.
Use-case of Viola-Jones: This model can be run on CPU, hence can beexperimented for learning purpose.
With Regards,
Ekta Smothra
I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.