Number of positive and negative images needed to create a simple haarcascade - image-processing

How many positive and how many negative samples will I need to recognize a pattern like one of the 3 stickers on this picture:
http://i.expansys.com/i/b/b199956.jpg
Note: that I'm talking about samples for creating a HaarCascade file in xml for OpenCV
Thx!
Antoine

Experimentation would be key. Hundreds would be a reasonable first guess for building proper rotational and translational invariances. Rotation would be 16 orientations (human perception limit, most template matching algorithms like these are sensitive to approx. +/- 10 degrees). Any other factors will increase sample requirements multiplicatively.
That said, I'm not sure Haar Cascades are an appropriate solution. They typically take advantage of the grey scale contrast to perform detection. The rotational and translation invariance is also built in via brute force.
By using Haar Cascades, you're throwing away a lot of the rich color information that you have.
Consider the following approach:
Some edge detection (Canny, Sobel, pick your poison)
Hough transform to solve for orientation of the rectangles
Normalize and crop the patterns.
Do color histogramming to discriminate between the three.

Related

openCV H detection

For a drone contest, I need to do image processing with openCV to detect an “H” (for a helicopter landing pad). I have tried some classical algorithms, but the result is not satisfying.
SIFT (and SURF): all the angles are the same (90 degrees) so even if it finds to “H”, it is mistaken about the orientation.
matchTemplate: it is quite good, but it is not rotation and size invariant. So I need to make too many tests with different sizes and different orientations.
Hough Line Transform: when the drone is too far from the target or too close to it, it doesn’t detect the same lines because of their thickness.
Machine Learning for OCR: I ignore how to make it learn accurately because the template I am searching for is unique.
Can someone give me some advices please? :)
EDIT: Here is the "H" we need to detect:
The best approach for recognising a helipad is to train a Haar classifier, and then run it on:
original image
Images rotated by plus and minus 22, 45, 68 ,90 degrees
A Haar classifier is trained by adding small rotations, so the above angles should be good enough to cover all rotations of the helipad in an image. another approach is to train multiple classifiers for different rotations; this is more common because Haar classifiers give up with the earliest evidence, and it is fast to run multiple classifiers than rotate a high resolution image.
One can also try template matching with rotations, but that will need a much larger number of rotations.

Water Edge Detection

Is there a robust way to detect the water line, like the edge of a river in this image, in OpenCV?
(source: pequannockriver.org)
This task is challenging because a combination of techniques must be used. Furthermore, for each technique, the numerical parameters may only work correctly for a very narrow range. This means either a human expert must tune them by trial-and-error for each image, or that the technique must be executed many times with many different parameters, in order for the correct result to be selected.
The following outline is highly-specific to this sample image. It might not work with any other images.
One bit of advice: As usual, any multi-step image analysis should always begin with the most reliable step, and then proceed down to the less reliable steps. Whenever possible, the less reliable step should make use of the result of more-reliable steps to augment its own accuracy.
Detection of sky
Convert image to HSV colorspace, and find the cyan located at the upper-half of the image.
Keep this HSV image, becuase it could be handy for the next few steps as well.
Detection of shrubs
Run Canny edge detection on the grayscale version of image, with suitably chosen sigma and thresholds. This will pick up the branches on the shrubs, which would look like a bunch of noise. Meanwhile, the water surface would be relatively smooth.
Grayscale is used in this technique in order to reduce the influence of reflections on the water surface (the green and yellow reflections from the shrubs). There might be other colorspaces (or preprocessing techniques) more capable of removing that reflection.
Detection of water ripples from a lower elevation angle viewpoint
Firstly, mark off any image parts that are already classified as shrubs or sky. Since shrub detection would be more reliable than water detection, shrub detection's result should be used to inform the less-reliable water detection.
Observation
Because of the low elevation angle viewpoint, the water ripples appear horizontally elongated. In fact, every image feature appears stretched horizontally. This is called Anisotropy. We could make use of this tendency to detect them.
Note: I am not experienced in anisotropy detection. Perhaps you can get better ideas from other people.
Idea 1:
Use maximally-stable extremal regions (MSER) as a blob detector.
The Wikipedia introduction appears intimidating, but it is really related to connected-component algorithms. A naive implementation can be done similar to Dijkstra's algorithm.
Idea 2:
Notice that the image features are horizontally stretched, a simpler approach is to just sum up the absolute values of horizontal gradients and compare that to the sum of absolute values of vertical gradients.

Object Recognition by Outlines vs Features

Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!

How to verify the correctness of calibration of a webcam?

I am totally new to camera calibration techniques... I am using OpenCV chessboard technique... I am using a webcam from Quantum...
Here are my observations and steps..
I have kept each chess square side = 3.5 cm. It is a 7 x 5 chessboard with 6 x 4 internal corners. I am taking total of 10 images in different views/poses at a distance of 1 to 1.5 m from the webcam.
I am following the C code in Learning OpenCV by Bradski for the calibration.
my code for calibration is
cvCalibrateCamera2(object_points,image_points,point_counts,cvSize(640,480),intrinsic_matrix,distortion_coeffs,NULL,NULL,CV_CALIB_FIX_ASPECT_RATIO);
Before calling this function I am making the first and 2nd element along the diagonal of the intrinsic matrix as one to keep the ratio of focal lengths constant and using CV_CALIB_FIX_ASPECT_RATIO
With the change in distance of the chess board the fx and fy are changing with fx:fy almost equal to 1. there are cx and cy values in order of 200 to 400. the fx and fy are in the order of 300 - 700 when I change the distance.
Presently I have put all the distortion coefficients to zero because I did not get good result including distortion coefficients. My original image looked handsome than the undistorted one!!
Am I doing the calibration correctly?. Should I use any other option than CV_CALIB_FIX_ASPECT_RATIO?. If yes, which one?
Hmm, are you looking for "handsome" or "accurate"?
Camera calibration is one of the very few subjects in computer vision where accuracy can be directly quantified in physical terms, and verified by a physical experiment. And the usual lesson is that (a) your numbers are just as good as the effort (and money) you put into them, and (b) real accuracy (as opposed to imagined) is expensive, so you should figure out in advance what your application really requires in the way of precision.
If you look up the geometrical specs of even very cheap lens/sensor combinations (in the megapixel range and above), it becomes readily apparent that sub-sub-mm calibration accuracy is theoretically achievable within a table-top volume of space. Just work out (from the spec sheet of your camera's sensor) the solid angle spanned by one pixel - you'll be dazzled by the spatial resolution you have within reach of your wallet. However, actually achieving REPEATABLY something near that theoretical accuracy takes work.
Here are some recommendations (from personal experience) for getting a good calibration experience with home-grown equipment.
If your method uses a flat target ("checkerboard" or similar), manufacture a good one. Choose a very flat backing (for the size you mention window glass 5 mm thick or more is excellent, though obviously fragile). Verify its flatness against another edge (or, better, a laser beam). Print the pattern on thick-stock paper that won't stretch too easily. Lay it after printing on the backing before gluing and verify that the square sides are indeed very nearly orthogonal. Cheap ink-jet or laser printers are not designed for rigorous geometrical accuracy, do not trust them blindly. Best practice is to use a professional print shop (even a Kinko's will do a much better job than most home printers). Then attach the pattern very carefully to the backing, using spray-on glue and slowly wiping with soft cloth to avoid bubbles and stretching. Wait for a day or longer for the glue to cure and the glue-paper stress to reach its long-term steady state. Finally measure the corner positions with a good caliper and a magnifier. You may get away with one single number for the "average" square size, but it must be an average of actual measurements, not of hopes-n-prayers. Best practice is to actually use a table of measured positions.
Watch your temperature and humidity changes: paper adsorbs water from the air, the backing dilates and contracts. It is amazing how many articles you can find that report sub-millimeter calibration accuracies without quoting the environment conditions (or the target response to them). Needless to say, they are mostly crap. The lower temperature dilation coefficient of glass compared to common sheet metal is another reason for preferring the former as a backing.
Needless to say, you must disable the auto-focus feature of your camera, if it has one: focusing physically moves one or more pieces of glass inside your lens, thus changing (slightly) the field of view and (usually by a lot) the lens distortion and the principal point.
Place the camera on a stable mount that won't vibrate easily. Focus (and f-stop the lens, if it has an iris) as is needed for the application (not the calibration - the calibration procedure and target must be designed for the app's needs, not the other way around). Do not even think of touching camera or lens afterwards. If at all possible, avoid "complex" lenses - e.g. zoom lenses or very wide angle ones. For example, anamorphic lenses require models much more complex than stock OpenCV makes available.
Take lots of measurements and pictures. You want hundreds of measurements (corners) per image, and tens of images. Where data is concerned, the more the merrier. A 10x10 checkerboard is the absolute minimum I would consider. I normally worked at 20x20.
Span the calibration volume when taking pictures. Ideally you want your measurements to be uniformly distributed in the volume of space you will be working with. Most importantly, make sure to angle the target significantly with respect to the focal axis in some of the pictures - to calibrate the focal length you need to "see" some real perspective foreshortening. For best results use a repeatable mechanical jig to move the target. A good one is a one-axis turntable, which will give you an excellent prior model for the motion of the target.
Minimize vibrations and associated motion blur when taking photos.
Use good lighting. Really. It's amazing how often I see people realize late in the game that you need a generous supply of photons to calibrate a camera :-) Use diffuse ambient lighting, and bounce it off white cards on both sides of the field of view.
Watch what your corner extraction code is doing. Draw the detected corner positions on top of the images (in Matlab or Octave, for example), and judge their quality. Removing outliers early using tight thresholds is better than trusting the robustifier in your bundle adjustment code.
Constrain your model if you can. For example, don't try to estimate the principal point if you don't have a good reason to believe that your lens is significantly off-center w.r.t the image, just fix it at the image center on your first attempt. The principal point location is usually poorly observed, because it is inherently confused with the center of the nonlinear distortion and by the component parallel to the image plane of the target-to-camera's translation. Getting it right requires a carefully designed procedure that yields three or more independent vanishing points of the scene and a very good bracketing of the nonlinear distortion. Similarly, unless you have reason to suspect that the lens focal axis is really tilted w.r.t. the sensor plane, fix at zero the (1,2) component of the camera matrix. Generally speaking, use the simplest model that satisfies your measurements and your application needs (that's Ockam's razor for you).
When you have a calibration solution from your optimizer with low enough RMS error (a few tenths of a pixel, typically, see also Josh's answer below), plot the XY pattern of the residual errors (predicted_xy - measured_xy for each corner in all images) and see if it's a round-ish cloud centered at (0, 0). "Clumps" of outliers or non-roundness of the cloud of residuals are screaming alarm bells that something is very wrong - likely outliers due to bad corner detection or matching, or an inappropriate lens distortion model.
Take extra images to verify the accuracy of the solution - use them to verify that the lens distortion is actually removed, and that the planar homography predicted by the calibrated model actually matches the one recovered from the measured corners.
This is a rather late answer, but for people coming to this from Google:
The correct way to check calibration accuracy is to use the reprojection error provided by OpenCV. I'm not sure why this wasn't mentioned anywhere in the answer or comments, you don't need to calculate this by hand - it's the return value of calibrateCamera. In Python it's the first return value (followed by the camera matrix, etc).
The reprojection error is the RMS error between where the points would be projected using the intrinsic coefficients and where they are in the real image. Typically you should expect an RMS error of less than 0.5px - I can routinely get around 0.1px with machine vision cameras. The reprojection error is used in many computer vision papers, there isn't a significantly easier or more accurate way to determine how good your calibration is.
Unless you have a stereo system, you can only work out where something is in 3D space up to a ray, rather than a point. However, as one can work out the pose of each planar calibration image, it's possible to work out where each chessboard corner should fall on the image sensor. The calibration process (more or less) attempts to work out where these rays fall and minimises the error over all the different calibration images. In Zhang's original paper, and subsequent evaluations, around 10-15 images seems to be sufficient; at this point the error doesn't decrease significantly with the addition of more images.
Other software packages like Matlab will give you error estimates for each individual intrinsic, e.g. focal length, centre of projection. I've been unable to make OpenCV spit out that information, but maybe it's in there somewhere. Camera calibration is now native in Matlab 2014a, but you can still get hold of the camera calibration toolbox which is extremely popular with computer vision users.
http://www.vision.caltech.edu/bouguetj/calib_doc/
Visual inspection is necessary, but not sufficient when dealing with your results. The simplest thing to look for is that straight lines in the world become straight in your undistorted images. Beyond that, it's impossible to really be sure if your cameras are calibrated well just by looking at the output images.
The routine provided by Francesco is good, follow that. I use a shelf board as my plane, with the pattern printed on poster paper. Make sure the images are well exposed - avoid specular reflection! I use a standard 8x6 pattern, I've tried denser patterns but I haven't seen such an improvement in accuracy that it makes a difference.
I think this answer should be sufficient for most people wanting to calibrate a camera - realistically unless you're trying to calibrate something exotic like a Fisheye or you're doing it for educational reasons, OpenCV/Matlab is all you need. Zhang's method is considered good enough that virtually everyone in computer vision research uses it, and most of them either use Bouguet's toolbox or OpenCV.

How to get to shoulders localization on image photo?

I have to localize from video where the shoulders of a person are in a movie.
Do have any advice on how to get to this?
I thought about corner detection or some kind of shape detection. But I'm still not sure what next. We can treat video like image sequence (I wrote this, but I think is obvious)?
Luckily, the shoulders are usually attached to the head...
I have used the Dalal-Triggs algorithm (Wikipedia) to detect head+shoulders of all persons facing the camera.
Basically, you train a linear SVM on positive examples in which the head+shoulders are marked, and on negative examples that do not contain these body parts. The descriptor is a Histogram of Gradients (HOG) which tells you what edge directions are dominant in each cell of the descriptor. I found that their normalization scheme is very important in dealing with non-uniform lighting.
With enough examples, the linear SVM will provide you with a plane normal that can be interpreted as a descriptor: you can visualize the meaning of the positive weights, and see that they outline the profile of head+shoulders. Likewise, the negative weights will belong to the areas outside the body, and/or directions orthogonal to the profile edges.
You can apply the linear SVM classifier on each image efficiently at multiple scales and aspect ratios, and find the image patch with best response. This should give you the location of the head and shoulders (it will not be exact, though)

Resources