I've read a lot about Viola Jones method but i still not understand about "Weak Classifier", "Strong Classifier", "Sub Window" in Rectangle features, what is definition about them. And what about "threshold"? How i can know the threshold value?
Can anyone help me? Thanks Before
Aim of Viola-Jones algorithm: Detection of faces in an image. This algorithm uses frontal upright faces, thus in order to be detected, the entire face must point towards the camera and should not be tilted to either side. Algorithm isface image partition based on physical estimation of position of eyes, noseand mouth on face.
Stages of the algorithm: This algorithm works in following four stages:
1. Haar features
2. Integral image
3. AdaBoost
4. Cascading
All these stages are discussed below. Before that i will answer a simple question that **why haar** ?
Haar wavelets are preferred because it is better than fourier for feature extraction.
Now, we will discuss about the stages involved in this algorithm.
Haar features: Over the given input image, a 24 x 24 base window will slide while passing haar as an argument and computation will take place usingconvolution theorem.What are different haar features, you can study aboutthem here
The output of this phase will be the detection of light and dark parts of an image.
Integral Image:The haar features extracted in above phase will be verylarge which will make computation very complex. To make this computationsimple and short, these extracted haar features are passed to integral image.It calculates the pixel values using simple mathematics. You can learn thiscalculation in the link provided above.
AdaBoost: As there will be so many features, all of them will not includeface in it. From, integral image we will get two possible things: featurescontaining face and features containing no face. We need only those featureswhich contains face. This job will be done by Adaboost. It will help to sampleface from rest of the body parts using weak classifiers and cascade. Theoverall process used is ensemble method. A weighted arrangement of all thesefeatures are used in evaluating and deciding any given window has face or not.It will eliminate all redundant features
Cascading: Weak classifiers will be cascaded to make a one strong singleclassifier while window sliding over the whole image.This process is alsoknown as boosting up the weak classifiers. A sub-window classified as a faceis passed on to the next stage in the cascade.It follows that the additionalstages a given sub-window passes, the higher chances that the sub-windowreally contains a face.
Next what: This model will be tested on real images and faces will bedetected.
Use-case of Viola-Jones: This model can be run on CPU, hence can beexperimented for learning purpose.
With Regards,
Ekta Smothra
Related
I am working on a hand detection project. There are many good project on web to do this, but what I need is a specific hand pose detection. It needs a totally open palm and the whole palm face to outwards, like the image below:
The first hand faces to inwards, so it will not be detected, and the right one faces to outwards, it will be detected. Now I can detect hand with OpenCV. but how to tell the hand orientation?
Problem of matching with the forehand belongs to the texture classification, it's a classic pattern recognition problem. I suggest you to try one of the following methods:
Gabor filters: it is good to detect the orientation and pixel intensities (as forehand has different features), opencv has getGaborKernel function, the very important params of this function is theta (orientation) and lambd: (frequencies). To make it simple you can apply this process on a cropped zone of palm (as you have already detected it, it would be easy to crop for example the thumb, or a rectangular zone around the gravity center..etc). Then you can convolute it with a small database of images of the same zone to get the a rate of matching, or you can use the SVM classifier, where you have to train your SVM on a set of images by constructing the training matrix needed for SVM (check this question), this paper
Local Binary Patterns (LBP): it's an important feature descriptor used for texture matching, you can apply it on whole palm image or on a cropped zone or finger of image, it's easy to use in opencv, a lot of tutorials with codes are available for this method. I recommend you to read this paper talking about Invariant Texture Classification
with Local Binary Patterns. here is a good tutorial
Haralick Texture: I've read that it works perfectly when a set of features quantifies the entire image (Global Feature Descriptors). it's not implemented in opencv but easy to be implemented, check this useful tutorial
Training Models: I've already suggested a SVM classifier, to be coupled with some descriptor, that can works perfectly.
Opencv has an interesting FaceRecognizer class for face recognition, it could be an interesting idea to use it replacing the face images by the palm ones, (do resizing and rotation to get an unique pose of palm), this class has three methods can be used, one of them is Local Binary Patterns Histograms, which is recommended for texture recognition. and why not to try the other models (Eigenfaces and Fisherfaces ) , check this tutorial
well if you go for a MacGyver way you can notice that the left hand has bones sticking out in a certain direction, while the right hand has all finger lines and a few lines in the hand palms.
These lines are always sort of the same, so you could try to detect them with opencv edge detection or hough lines. Due to the dark color of the lines, you might even be able to threshold them out of it. Then gather the information from those lines, like angles, regressions, see which features you can collect and train a simple decision tree.
That was assuming you do not have enough data, if you have then you go into deeplearning, just take a basic inceptionV3 model and retrain the last dense layer to classify between two classes with a softmax, or to predict the probablity if the hand being up/down with sigmoid. Check this link, Tensorflow got your back on the training of this one, pure already ready code to execute.
Questions? Ask away
Take a look at what leap frog has done with the oculus rift. I'm not sure what they're using internally to segment hand poses, but there is another paper that produces hand poses effectively. If you have a stereo camera setup, you can use this paper's methods: https://arxiv.org/pdf/1610.07214.pdf.
The only promising solutions I've seen for mono camera train on large datasets.
use Haar-Cascade classifier,
you can get the classifier model file then use it here.
Just search for 'Haarcascade detection of Palm in Google' or use below code.
import cv2
cam=cv2.VideoCapture(0)
ccfr2=cv2.CascadeClassifier('haar-cascade-files-master/palm.xml')
while True:
retval,image=cam.read()
grey=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
palm=ccfr2.detectMultiScale(grey,scaleFactor=1.05,minNeighbors=3)
for x,y,w,h in palm:
image=cv2.rectangle(image,(x,y),(x+w,y+h),(256,256,256),2)
cv2.imshow("Window",image)
if cv2.waitKey(1) & 0xFF==ord('q'):
cv2.destroyAllWindows()
break
del(cam)
Best of Luck for your experience using HaarCascade.
My xmas holiday project this year was to build a little Android app, which should be able to detect arbitrary Euro coins in a picture, recognize their value and sum the values up.
My assumptions/requirements for the picture for a good recognition are
uniform background
picture should be roughly the size of a DinA4 paper
coins may not overlap, but may touch each other
number-side of the coins must be up/visible
My initial thought was, that for the coin value-recognition later it would be best to first detect the actual coins/their regions in the picture. Any recognition then would run on only these regions of the picture, where actual coins are found.
So the first step was to find circles. This i have accomplished using this OpenCV 3 pipeline, as suggested in several books and SO postings:
convert to gray
CannyEdge detection
Gauss blurring
HoughCircle detection
filtering out inner/redundant circles
The detection works rather successfully IMHO, here a picture of the result:
Coins detected with HoughCircles with blue border
Up to the recognition now for every found coin!
I searched for solutions to this problem and came up with
template matching
feature detection
machine learning
The template matching seems very inappropriate for this problem, as the coins can be arbitrary rotated with respect to a template coin (and the template matching algorithm is not rotation-invariant! so i would have to rotate the coins!).
Also pixels of the template coin will never exactly match those of the region of the formerly detected coin. So any algorithm computing the similarity will produce only poor results, i think.
Then i looked into feature detection. This seemed more appropriate to me. I detected the features of a template-coin and the candidate-coin picture and drew the matches (combination of ORB and BRUTEFORCE_HAMMING). Unfortunately the features of the template-coin were also detected in the wrong candidate coins.
See the following picture, where the template or "feature" coin is on the left, a 20 Cents coin. To the right there are the candidate coin, where the left-most coin is a 20 Cents coin. I actually expected this coin to have the most matches, unfortunately not. So again, this seems not to be a viable way to recognize the value of coins.
Feature-matches drawn between a template coin and candidate coins
So machine learning is the third possible solution. From university i still now about neural networks, how they work, etc. Unfortunately my practical knowledge is rather poor AND i don't know Support Vector Machines (SVM) at all, which is the machine learning supported by OpenCV.
So my question is actually not source-code related, but more how to setup the learning process.
Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
How much positives and negatives per coin should be given?
Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Of course, if you have a different possible solution, this is also welcome!
Any help is appreciated! :-)
Bye & Thanks!
Answering your questions:
1- Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
For many object classification tasks it's better to extract the features first and then train a classifier using a learning algorithm. (e.g the features can be HOG and the learning algorithm can be something like SVM or Adaboost). It's mainly due to the fact that the features have more meaningful information compared to the pixel values. (They can describe edges,shapes, texture, etc.) However, the algorithms like deep learning will extract the useful features as a part of learning procedure.
2 - How much positives and negatives per coin should be given?
You need to answer this question depending on the variation in the classes you want to recognize and the learning algorithm you use. For SVM , if you use HOG features and want to recognize specific numbers on coins you won't need much.
3- Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
Again it depends on your final decision about the features(not SVM which is the learning algorithm) you're going to choose. HOG features are not rotation invariant but there are features like SIFT or SURF which are.
4-One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Again, choose your algorithm , some of them ask you for a fixed/similar width/height ratio. You can find out about the specific requirements in related papers.
If you decide to use SVM take a look at this and also if you feel ok with Neural Network, using Tensorflow is a good idea.
I'm doing my project which need to detect/classify some simple sign language.
I'm new to opencv, I have try to use contours,hull but it seem very hard to apply...
I googled and find the method call "Haarcascade" which seem to be about taking pictures and create .xml file.
So, I decide to do Haarcascade......
Here are some example of the sign language that I want to detect/classify
Set1 : http://www.uppic.org/image-B600_533D7A09.jpg
Set2 : http://www.uppic.org/image-0161_533D7A09.jpg
The result I want here is to classify these 2 set.
Any suggestion if I could use haarcascade method with this
*I'm using xcode with my webcam, but soon I'm gonna port them onto iOS device. Is it possible?
First of all: I would not use haar features for learning on whole images.
Let's see how haar features look like:
Let me point out how learning works. We're building a classifier that consists of many 'weak' classifiers. In approximation, every 'weak' classifier is built in such way to find out information about several haar features. To simplify, let's peek one of them to consideration, a first one from edge features. During learning in some way, we compute a threshold value by sliding this feature over the whole input training image, using feature as a mask: we sum pixels 'under' the white part of the feature, sum pixels 'under' black part and subtract one value from other. In our case, threshold value will give an information if vertical edge feature exists on the training image. After training of weak classifier, you repeat process with different haar features. Every weak classifier gives information about different features.
What is important: I summarized how training works to describe what kind of objects are good to be trained in such way. Let's pick the most powerful application - detecting human's face. There's an important feature of face:
It has a landmarks which are constrastive (they differ from background - skin)
The landmark's locations are correlated to each other in every face (e.g. distance between them in approximation is some factor of face size)
That makes haar features powerful in that case. As you can see, one can easily point out haar features which are useful for face detection e.g. first and second of line features are good for detection a nose.
Back to your problem, ask yourself if your problem have features 1. and 2. In case of whole image, there is too much unnecessary data - background, folds on person's shirt and we don't want to noise classifier with it.
Secondly, I would not use haar features from some cropped regions.
I think the difference between palms is too less for haar classifier. You can derive that from above description. The palms are not different so much - the computed threshold levels will be too similar. The most significant features for haar on given palms will be 'edges' between fingers and palm edges. You can;t rely on palm's edges - it depends from the background (walls, clothes etc.) And edges between fingers are carrying too less information. I am claiming that because I have an experience with learning haar classifier for palm. It started to work only if we cropped palm region containing fingers.
At the end of the introduction to this instructive kaggle competition, they state that the methods used in "Viola and Jones' seminal paper works quite well". However, that paper describes a system for binary facial recognition, and the problem being addressed is the classification of keypoints, not entire images. I am having a hard time figuring out how, exactly, I would go about adjusting the Viola/Jones system for keypoint recognition.
I assume I should train a separate classifier for each keypoint, and some ideas I have are:
iterate over sub-images of a fixed size and classify each one, where an image with a keypoint as center pixel is a positive example. In this case I'm not sure what I would do with pixels close to the edge of the image.
instead of training binary classifiers, train classifiers with l*w possible classes (one for each pixel). The big problem with this is that I suspect it will be prohibitively slow, as every weak classifier suddenly has to do l*w*original operations
the third idea I have isn't totally hashed out in my mind, but since the keypoints are each parts of a greater part of a face (left, right center of an eye, for example), maybe I could try to classify sub-images as just an eye, and then use the left, right, and center pixels (centered in the y coordinate) of the best-fit subimage for each face-part
Is there any merit to these ideas, and are there methods I haven't thought of?
however, that paper describes a system for binary facial recognition
No, read the paper carefully. What they describe is not face specific, face detection was the motivating problem. The Viola Jones paper introduced a new strategy for binary object recognition.
You could train a Viola Jones style Cascade for eyes, another for a nose, and one for each keypoint you are interested in.
Then, when you run the code - you should (hopefully) get 2 eyes, 1 nose, etc, for each face.
Provided you get the number of items you expected, you can then say "here are the key points!" What takes more work is getting enough data to build a good detector for each thing you want to detect, and gracefully handling false positives / negatives.
I ended up working on this problem extensively. I used "deep learning," aka several layers of neural networks. I used convolutional networks. You can learn more about them by checking out these demos:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
http://deeplearning.net/tutorial/lenet.html#lenet
I made the following changes to a typical convolutional network:
I did not do any down-sampling, as any loss of precision directly translates to a decrease in the model's score
I did n-way binary classification, with each pixel being classified as a keypoint or non-keypoint (#2 in the things I listed in my original post). As I suspected, computational complexity was the primary barrier here. I tried to use my GPU to overcome these issues, but the number of parameters in the neural network were too large to fit in GPU memory, so I ended up using an xl amazon instance for training.
Here's a github repo with some of the work I did:
https://github.com/cowpig/deep_keypoints
Anyway, given that deep learning has blown up in popularity, there are surely people who have done this much better than I did, and published papers about it. Here's a write-up that looks pretty good:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
I have used Haar classifier with OpenCV before succesfully. Unfortunately it seems to work only on square objects and fixed angles (i.e. faces). However I need to find "long" (rectangular) objects which have different angles (see sample input image).
Is there a way to train Haar classifier to find such objects? All I can find are tutorials for face recognition. Any other alternative approches?
Haar classifiers are known to work with rigid object only. You need a classifier for each of the view. For example, the side-face classifier in OpenCV doesn't work as good as front-face classifer(due to the reason being, side face has more variation in yaw-pitch-roll than front face).
There is no perfect way of answering your question.
However, in your case whatever you are trying to classify (microbes I suppose) are overlapping on each other. Its a complex issue. But, you can isolate the region where microbes occur (not isolate each microbe like a face).
You can refer fingerprint segmentation techniques that are known to enhance the ridges on a fingerprint (here in your case its microbe edges) from the background and isolate the image.
Check "ridgesegmentation.m" in the following page:
http://www.csse.uwa.edu.au/~pk/Research/MatlabFns/index.html