The problem is to detect a known rectangular object in an image.
Which of the following is computationally less expensive:
Finding homography - For finding homography, we use the template of the known object to do feature matching.
Contour detection - We try to detect the biggest contour in the image. In this particular case we assume that the biggest contour will correspond to the known rectangular object we are trying to find.
In both the cases we do perspective transform after detecting the object to set the perspective.
NOTE: We are using Open-CV functions to find the homography and detecting contour.
You should try finding the biggest contour. It's the simplest and will be far faster. You needs to detect Canny edges then find contours and find the one with the biggest area. However, it can fail if contours are unclear or if there is a bigger object as it doesn't consider shape. You can also apply both of your ideas to get better results.
EDIT:
To reply your comment, you have Canny edge + find contours + find biggest against find features + match features
I think that the first combination is less computationally expensive. Moreover, there is a good implementation of squares/rectangle detection here.
However, if the contours of the rectangle are not clear, and if moreover the rectangle is highly textured, you should get better results with features matching.
Related
Can I use approxPolyDP to improve the people detection?
I am trying to detect people using BackgroundSubtractorMOG2. So after I receive the foreground of the image I obtain all the contours of the image using this function:
Imgproc.findContours(contourImg, contours, hierarchy, Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_NONE);
I iterate each element of contours variable and if the contour has a specified contour area => it's a person
My question is if I can detect the people shapes better ussing approxPolyDP.
Is it possible? If so can you tell me how?
I already used CLOSE morphological operation before finding counturs
My question is if I can detect the people shapes better ussing approxPolyDP.
Although "better" is somewhat ambiguous, you could improve your classification by using that method. From the docs we can see:
The functions approxPolyDP approximate a curve or a polygon with another curve/polygon with less vertices so that the distance between them is less or equal to the specified precision.
That "precision" refers to the epsilon parameter, which stands for the "maximum distance between the original curve and its approximation" (also from the docs). It is basically an accuracy parameter obtained from the arc length (the lower the more precise the contour will be).
We can see from this tutorial that a way to achieve this is:
epsilon = 0.1*cv2.arcLength(contour,True)
approx = cv2.approxPolyDP(contour,epsilon,True)
resulting in a better approximation for the contour. In the example in that tutorial they achieve an optimal contour using 1% of arc length, although you should carefully select this percentage for your specific situation.
By using these procedures you will surely obtain higher precision on your contour areas, which will better enable you to correctly classify people from other objects that have similar areas. You will also have to modify your classification criterion (the >= some_area) accordingly to correctly discriminate between people and non-people objects now that you have a more precise area.
I'm getting some info from image with Canny algorithm and findContours function.
Sometimes I get too many noisy points in some images which contains hairs or any other detailed stuff. I wonder how can I merge close enough points with OpenCV. For example I wish I could merge all points which are distanced from each other on less then X. (sqrt(dxdx + dydy) < X I mean).
I heard that OpenCV has it's own wrapper around FLANN, but I'm not sure how do I use it.
And yeah, I want merging to be done on all contours awared of each other, not inside each contour individually.
Use DBScan Clustering. on bounding rectangles.
When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF
I am using OpenCV to do segmentation using methods like grabcut, watershed. Then use findContours to obtain the contour. The actual contour I would like to obtain is a trapezoid and the functions approxPolyDP and convexHull cannot do this. Can somebody give me some hits? Maybe there are other methods rather than segmentation to obtain it? I can think of edge detection using methods like Canny but the result is not good because of unconstrained background. A lot of segments have to be connected and it is kind of hard.
The sample image is also attached (the first one--human shoulder). I would like to find the contour and the location of where the contour/edge changes its direction, that is the human shoulders. As in the second image, the right corner point can change resulting in a trapezoid.
1.jpg: original image
2.jpg: the contour is labelled by hand
3.jpg: fitted lines
https://drive.google.com/folderview?id=0ByQ8kRZEPlqXUUZyaGtpSkJDeXc&usp=sharing
Thanks.
I am doing something similar to this problem:
Matching a curve pattern to the edges of an image
Basically, I have the same curve in two images, but with some affine transform between the two. Here is an example of two images:
Image1
Image2
So in order to get to Image2, you can apply some translation, rotation, scale, etc. to Image1.
Does anyone know how to solve for this transform?
Phase correlation doesn't work because it's not a translation only. Optical flow doesn't work since there's not enough detail to resolve translation, rotation, scale (It's pretty much a binary image). I'm not sure if Hough Transforms will give me good data.
I think some sort of keypoint matching algorithm like sift or surf would work with this kind of data as well.
The basic idea would be to find a limited number of "interesting" keypoints in each image, then match these keypoints pairwise.
Here is a quick test of your image with an online ASIFT demo:
http://demo.ipol.im/demo/my_affine_sift/result?key=BF9F4E4E006AB5168497709836C39C74#
It is probably more suited for normal greyscale images, but nevertheless it seems to work for this data. It looks like the lines connect roughly the same points around both of the curves; plugging all these pairs into something like the FindHomography function in OpenCv, the small discrepancies should even themselves out and you get the affine transformation matrix between the two images.
For your particular data you might be able to come up with better keypoint descriptors; perhaps something to detect the line ends, line crossings and sharp corners.
Or how about this: It is a little more work, but if you can vectorize your paths into a bezier or b-spline, you can get some natural keypoints from the spline descriptors.
I do not know any vectorisation library, but Inkscape has a basic implementation with which you could test the approach.
Once you have a small set of descriptors instead of a large 2d bitmap, you only need to match these descriptors between the two images, as per FindHomography.
answer to comment:
The points of interest are merely small areas that have certain properties. So the center of those areas might be black or white; the algorithm does not specifically look for white pixels or large-scale shapes such as the curve. What matter is that the lines connect roughly the same points on both curves, at least at first glance.