For a drone contest, I need to do image processing with openCV to detect an “H” (for a helicopter landing pad). I have tried some classical algorithms, but the result is not satisfying.
SIFT (and SURF): all the angles are the same (90 degrees) so even if it finds to “H”, it is mistaken about the orientation.
matchTemplate: it is quite good, but it is not rotation and size invariant. So I need to make too many tests with different sizes and different orientations.
Hough Line Transform: when the drone is too far from the target or too close to it, it doesn’t detect the same lines because of their thickness.
Machine Learning for OCR: I ignore how to make it learn accurately because the template I am searching for is unique.
Can someone give me some advices please? :)
EDIT: Here is the "H" we need to detect:
The best approach for recognising a helipad is to train a Haar classifier, and then run it on:
original image
Images rotated by plus and minus 22, 45, 68 ,90 degrees
A Haar classifier is trained by adding small rotations, so the above angles should be good enough to cover all rotations of the helipad in an image. another approach is to train multiple classifiers for different rotations; this is more common because Haar classifiers give up with the earliest evidence, and it is fast to run multiple classifiers than rotate a high resolution image.
One can also try template matching with rotations, but that will need a much larger number of rotations.
Related
I have set of 2000 plane images similar to image below. Plane has different angle on every image. Image size is 512x512 and in every image is always this same plane.
My goal is to find angle on image which is not in test set.
So far I tried:
Harris corner detection, but in every image Harris gives me differnt
amount of points, event for images with very similar position.
Hough Lines Transform to find the longest line and get inclination to the axis X.
Corelation - this method gives the best results, but it take really long time and angels are only rough.
Neural network
Back porpagation to train image from Harris points and hough lines transform, but without any success.
I so 3D object in STP file, but I have no idea how to use it, to solve my problem.
It would be nice to get any sugestion of method, article or example.
In my experience, a convolutional neural network (CNN) will help you a great deal here. The performance will be great at detecting angles.
But here is the problem, depending of how you define the output to be and the number of layers (no more than three should be enough), the training can be very costly. For example, you could have one single output that could give you a real number which indicates the angle. Training this should be costly, but it is normal in CNNs. However, if you say you want to have 360 outputs (one for each angle in a 360 degree system), in that case the training will be a very painful and unpleasant long experience; the performance could be better, but not significantly.
(I wanted to write this as a comment to your question first, but I don't have enough reputation to do that yet, sorry.)
I'm doing my project which need to detect/classify some simple sign language.
I'm new to opencv, I have try to use contours,hull but it seem very hard to apply...
I googled and find the method call "Haarcascade" which seem to be about taking pictures and create .xml file.
So, I decide to do Haarcascade......
Here are some example of the sign language that I want to detect/classify
Set1 : http://www.uppic.org/image-B600_533D7A09.jpg
Set2 : http://www.uppic.org/image-0161_533D7A09.jpg
The result I want here is to classify these 2 set.
Any suggestion if I could use haarcascade method with this
*I'm using xcode with my webcam, but soon I'm gonna port them onto iOS device. Is it possible?
First of all: I would not use haar features for learning on whole images.
Let's see how haar features look like:
Let me point out how learning works. We're building a classifier that consists of many 'weak' classifiers. In approximation, every 'weak' classifier is built in such way to find out information about several haar features. To simplify, let's peek one of them to consideration, a first one from edge features. During learning in some way, we compute a threshold value by sliding this feature over the whole input training image, using feature as a mask: we sum pixels 'under' the white part of the feature, sum pixels 'under' black part and subtract one value from other. In our case, threshold value will give an information if vertical edge feature exists on the training image. After training of weak classifier, you repeat process with different haar features. Every weak classifier gives information about different features.
What is important: I summarized how training works to describe what kind of objects are good to be trained in such way. Let's pick the most powerful application - detecting human's face. There's an important feature of face:
It has a landmarks which are constrastive (they differ from background - skin)
The landmark's locations are correlated to each other in every face (e.g. distance between them in approximation is some factor of face size)
That makes haar features powerful in that case. As you can see, one can easily point out haar features which are useful for face detection e.g. first and second of line features are good for detection a nose.
Back to your problem, ask yourself if your problem have features 1. and 2. In case of whole image, there is too much unnecessary data - background, folds on person's shirt and we don't want to noise classifier with it.
Secondly, I would not use haar features from some cropped regions.
I think the difference between palms is too less for haar classifier. You can derive that from above description. The palms are not different so much - the computed threshold levels will be too similar. The most significant features for haar on given palms will be 'edges' between fingers and palm edges. You can;t rely on palm's edges - it depends from the background (walls, clothes etc.) And edges between fingers are carrying too less information. I am claiming that because I have an experience with learning haar classifier for palm. It started to work only if we cropped palm region containing fingers.
I'm trying to understand Viola Jones method, and I've mostly got it.
It uses simple Haar like features boosted into strong classifiers and organized into layers /cascade in order to accomplish better performances (not bother with obvious 'non object' regions).
I think I understand integral image and I understand how are computed values for the features.
The only thing I can't figure out is how is algorithm dealing with the face size variations.
As far as I know they use 24x24 subwindow that slides over the image, and within it algorithm goes through classifiers and tries to figure out is there a face/object on it, or not.
And my question is - what if one face is 10x10 size, and other 100x100? What happens then?
And I'm dying to know what are these first two features (in first layer of the cascade), how do they look like (keeping in mind that these two features, acording to Viola&Jones, will almost never miss a face, and will eliminate 60% of the incorrect ones) ? How??
And, how is possible to construct these features to work with these statistics for different face sizes in image?
Am I missing something, or maybe I've figured it all wrong?
If I'm not clear enough, I'll try to explain better my confusion.
Training
The Viola-Jones classifier is trained on 24*24 images. Each of the face images contains a similarly scaled face. This produces a set of feature detectors built out of two, three, or four rectangles optimised for a particular sized face.
Face size
Different face sizes are detected by repeating the classification at different scales. The original paper notes that good results are obtained by trying different scales a factor of 1.25 apart.
Note that the integral image means that it is easy to compute the rectangular features at any scale by simply scaling the coordinates of the corners of the rectangles.
Best features
The original paper contains pictures of the first two features selected in a typical cascade (see page 4).
The first feature detects the wide dark rectangle of the eyes above a wide brighter rectangle of the cheeks.
----------
----------
++++++++++
++++++++++
The second feature detects the bright thin rectangle of the bridge of the nose between the darker rectangles on either side containing the eyes.
---+++---
---+++---
---+++---
I'v got a binary classification problem. I'm trying to train a neural network to recognize objects from images. Currently I've about 1500 50x50 images.
The question is whether extending my current training set by the same images flipped horizontally is a good idea or not? (images are not symetric)
Thanks
I think you can do this to a much larger extent, not just flipping the images horizontally, but changing the angle of the image by 1 degree. This will result in 360 samples for every instance that you have in your training set. Depending on how fast your algorithm is, this may be a pretty good way to ensure that the algorithm isn't only trained to recognize images and their mirrors.
It's possible that it's a good idea, but then again, I don't know what's the goal or the domain of the image recognition. Let's say the images contain characters and you're asking the image recognition software to determine if an image contains a forward slash / or a back slash \ then flipping the image will make your training data useless. If your domain doesn't suffer from such issues, then I'd think it's a good idea to flip them and even rotate with varying degrees.
I have used flipped images in AdaBoost with great success in the course: http://www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/Schedule.php
from the zip "TrainingImages.tar.gz".
I know there are some information on pros/cons with using flipped images somewhere in the slides (at the homepage) but I can't find it. Also a great resource is http://www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/DownloadMaterial/FaceLab/Manual.pdf (together with the slides) going thru things like finding things in different scales and orientation.
If the images patches are not symmetric I don't think its a good idea to flip. Better idea is to do some similarity transforms to the training set with some limits. Another way to increase the dataset is to add gaussian smoothed templates to it. Make sure that the number of positive and negative samples are proportional. Too many positive and too less negative might skew the classifier and give bad performance on testing set.
It depends on what your NN is based on. If you are extracting rotation invariant features or features that do not depend on the spatial position within the the image (like histograms or whatever) and train your NN with these features, then rotating will not be a good idea.
If you are training directly on pixel values, then it might be a good idea.
Some more details might be useful.
How many positive and how many negative samples will I need to recognize a pattern like one of the 3 stickers on this picture:
http://i.expansys.com/i/b/b199956.jpg
Note: that I'm talking about samples for creating a HaarCascade file in xml for OpenCV
Thx!
Antoine
Experimentation would be key. Hundreds would be a reasonable first guess for building proper rotational and translational invariances. Rotation would be 16 orientations (human perception limit, most template matching algorithms like these are sensitive to approx. +/- 10 degrees). Any other factors will increase sample requirements multiplicatively.
That said, I'm not sure Haar Cascades are an appropriate solution. They typically take advantage of the grey scale contrast to perform detection. The rotational and translation invariance is also built in via brute force.
By using Haar Cascades, you're throwing away a lot of the rich color information that you have.
Consider the following approach:
Some edge detection (Canny, Sobel, pick your poison)
Hough transform to solve for orientation of the rectangles
Normalize and crop the patterns.
Do color histogramming to discriminate between the three.