I have decided to train Haar classifier for 102 flower categories given here:(The dataset)
http://www.robots.ox.ac.uk/~vgg/data/flowers/102/categories.html
In the link you can see several categories. I am posting a few images of an individual flower to explain the question.
This flower belongs to a single class. I have 250 images as positives. There is a considerable variation in this flower's others images(of color, brightness, orientation, etc.). I am hunting for negative images right now. As you might have guessed, I didn't click these pictures so I can't go to the places where these were clicked to collect negative dataset. Instead, I have decided to extract frames from a video. Here is the link:
https://www.youtube.com/watch?v=x3zT1mJE0W0
Here are the images from the video:
It is a video of general garden with bushes and plants background.
My question is: Will this video(and other similar videos) suffice for being negative samples for successful detection? Is it safe to train the classifier for these flowers at all?(I mean with lot of variation in the background. I also plan to use the rest flowers category images as negatives that I am not detecting except the flower that I am trying to detect which in the case here is the Passion Flower).
This is my first training and I am asking this because the training is gonna eat my whole day and night. I am skeptical about it beforehand.
The trick with negative images is to use whatever you have, and as many as possible. The more difference, and quantity, of your negative images means that you will end up with a more robust classifier.
As for your specific question about whether the bushes are a good negative data set compared to the flowers I would say they will be ok. The background behind the bushes is relatively similar and you have quite a distinct flower pattern for your positive samples.
Related
My xmas holiday project this year was to build a little Android app, which should be able to detect arbitrary Euro coins in a picture, recognize their value and sum the values up.
My assumptions/requirements for the picture for a good recognition are
uniform background
picture should be roughly the size of a DinA4 paper
coins may not overlap, but may touch each other
number-side of the coins must be up/visible
My initial thought was, that for the coin value-recognition later it would be best to first detect the actual coins/their regions in the picture. Any recognition then would run on only these regions of the picture, where actual coins are found.
So the first step was to find circles. This i have accomplished using this OpenCV 3 pipeline, as suggested in several books and SO postings:
convert to gray
CannyEdge detection
Gauss blurring
HoughCircle detection
filtering out inner/redundant circles
The detection works rather successfully IMHO, here a picture of the result:
Coins detected with HoughCircles with blue border
Up to the recognition now for every found coin!
I searched for solutions to this problem and came up with
template matching
feature detection
machine learning
The template matching seems very inappropriate for this problem, as the coins can be arbitrary rotated with respect to a template coin (and the template matching algorithm is not rotation-invariant! so i would have to rotate the coins!).
Also pixels of the template coin will never exactly match those of the region of the formerly detected coin. So any algorithm computing the similarity will produce only poor results, i think.
Then i looked into feature detection. This seemed more appropriate to me. I detected the features of a template-coin and the candidate-coin picture and drew the matches (combination of ORB and BRUTEFORCE_HAMMING). Unfortunately the features of the template-coin were also detected in the wrong candidate coins.
See the following picture, where the template or "feature" coin is on the left, a 20 Cents coin. To the right there are the candidate coin, where the left-most coin is a 20 Cents coin. I actually expected this coin to have the most matches, unfortunately not. So again, this seems not to be a viable way to recognize the value of coins.
Feature-matches drawn between a template coin and candidate coins
So machine learning is the third possible solution. From university i still now about neural networks, how they work, etc. Unfortunately my practical knowledge is rather poor AND i don't know Support Vector Machines (SVM) at all, which is the machine learning supported by OpenCV.
So my question is actually not source-code related, but more how to setup the learning process.
Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
How much positives and negatives per coin should be given?
Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Of course, if you have a different possible solution, this is also welcome!
Any help is appreciated! :-)
Bye & Thanks!
Answering your questions:
1- Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
For many object classification tasks it's better to extract the features first and then train a classifier using a learning algorithm. (e.g the features can be HOG and the learning algorithm can be something like SVM or Adaboost). It's mainly due to the fact that the features have more meaningful information compared to the pixel values. (They can describe edges,shapes, texture, etc.) However, the algorithms like deep learning will extract the useful features as a part of learning procedure.
2 - How much positives and negatives per coin should be given?
You need to answer this question depending on the variation in the classes you want to recognize and the learning algorithm you use. For SVM , if you use HOG features and want to recognize specific numbers on coins you won't need much.
3- Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
Again it depends on your final decision about the features(not SVM which is the learning algorithm) you're going to choose. HOG features are not rotation invariant but there are features like SIFT or SURF which are.
4-One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Again, choose your algorithm , some of them ask you for a fixed/similar width/height ratio. You can find out about the specific requirements in related papers.
If you decide to use SVM take a look at this and also if you feel ok with Neural Network, using Tensorflow is a good idea.
I'm trying to perform image recognition of large birds, and since the camera is moving, my usual tactic of background removal and shape recognition won't be effective. Since all adult birds of the species are remarkably similar in coloration, I was thinking that Haar classifiers may be effective, and found some resources to try and train my own classifier.
Negative images should be significantly larger than the positive images, and present similar environments but not contain the positive image. However, I haven't been able to find too many details on what makes for a good positive training image. I found some references to trying to keep a similar aspect ratio in all positive training images, but for something like a bird that can change dramatically in different poses (wings open, wings closed), is it crucial? Is it better to train multiple classifiers for each pose and orientation of the target to be recognized? Is it better to instead try to identify a subset of the object that is relatively consistent (like the bird's very distinctive black and white head)?
If I use a sub-feature like the head, does it matter if it is "mirrored" in some views? Should I artificially make my positive images face the same direction, and when running the classifier on an image, run it twice, once mirrored? What are the considerations made when designing the positive image set for a classifier?
I am wanting to count the number of cars in aerial images of parking lots. After some research I believe that Haar Cascade Classifiers might be an option for this. An example of an image I will be using would be something similar to a zoomed in image of a parking lot from Google Maps.
My current plan to accomplish this is to train a custom Haar Classifier using cars that I crop out of images in only one orientation (up and down), and then attempt recognition multiple times while rotating the image in 15 degree increments. My specific questions are:
Is using a Haar Classifier a good approach here or is there something better?
Assuming this is a good approach, when cropping cars from larger images for training data would it be better to crop a larger area that could possibly contain small portions of cars in adjacent parking spaces (although some training images would obviously include solo cars, cars with only one car next to them, etc.) or would it be best to crop the cars as close to their outline as possible?
Again assuming I am taking this approach, how could I avoid double counting cars? If a car was recognized in one orientation, I don't want it to be counted again. Is there some way that I could mark a car as counted and have it ignored?
I think in your case I would not go for Haar features, you should search for something that is rotation invariant.
I would recommend to approach this task in the following order:
Create a solid training / testing data set and have a good look into papers about getting good negative samples. In my experience good negative samples have a great deal of influence on the resulting quality of your classifier. It makes your life a lot easier if all your samples are of the same image size. Add different types of negative samples, half cars, just pavement, grass, trees, people etc...
Before starting your search for a classifier make sure that you have your evaluation pipeline in order, do a 10 fold cross evaluation with the simplest Haar classifier possible. Now you have a baseline. Try to keep the software for all features you tested working in caseou find out that your data set needs adjustment. Ideally you can just execute a script and rerun your whole evaluation on the new data set automatically.
The problem of counting cars multiple times will not be of such importance when you can find a feature that is rotation invariant. Still non maximum suppression will be in order becaus you might not get a good recognition with simple thresholding.
As a tip, you might consider HOG features, I did have some good results on cars with them.
I'm doing my project which need to detect/classify some simple sign language.
I'm new to opencv, I have try to use contours,hull but it seem very hard to apply...
I googled and find the method call "Haarcascade" which seem to be about taking pictures and create .xml file.
So, I decide to do Haarcascade......
Here are some example of the sign language that I want to detect/classify
Set1 : http://www.uppic.org/image-B600_533D7A09.jpg
Set2 : http://www.uppic.org/image-0161_533D7A09.jpg
The result I want here is to classify these 2 set.
Any suggestion if I could use haarcascade method with this
*I'm using xcode with my webcam, but soon I'm gonna port them onto iOS device. Is it possible?
First of all: I would not use haar features for learning on whole images.
Let's see how haar features look like:
Let me point out how learning works. We're building a classifier that consists of many 'weak' classifiers. In approximation, every 'weak' classifier is built in such way to find out information about several haar features. To simplify, let's peek one of them to consideration, a first one from edge features. During learning in some way, we compute a threshold value by sliding this feature over the whole input training image, using feature as a mask: we sum pixels 'under' the white part of the feature, sum pixels 'under' black part and subtract one value from other. In our case, threshold value will give an information if vertical edge feature exists on the training image. After training of weak classifier, you repeat process with different haar features. Every weak classifier gives information about different features.
What is important: I summarized how training works to describe what kind of objects are good to be trained in such way. Let's pick the most powerful application - detecting human's face. There's an important feature of face:
It has a landmarks which are constrastive (they differ from background - skin)
The landmark's locations are correlated to each other in every face (e.g. distance between them in approximation is some factor of face size)
That makes haar features powerful in that case. As you can see, one can easily point out haar features which are useful for face detection e.g. first and second of line features are good for detection a nose.
Back to your problem, ask yourself if your problem have features 1. and 2. In case of whole image, there is too much unnecessary data - background, folds on person's shirt and we don't want to noise classifier with it.
Secondly, I would not use haar features from some cropped regions.
I think the difference between palms is too less for haar classifier. You can derive that from above description. The palms are not different so much - the computed threshold levels will be too similar. The most significant features for haar on given palms will be 'edges' between fingers and palm edges. You can;t rely on palm's edges - it depends from the background (walls, clothes etc.) And edges between fingers are carrying too less information. I am claiming that because I have an experience with learning haar classifier for palm. It started to work only if we cropped palm region containing fingers.
I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.