I am trying to compare two image of drawings using corner features in the images. Here is a sample image:
Query image:
I used SIFT algorithm to compare images but it did not work because in SIFT we consider a window of 16X16 pixel to extract the features at point of interest but here in this case(drawing objects) we will get only corner points as feature points and SIFT feature descriptor will give very similar feature to all corner points and hence in the feature matching step it will reject the corners because of their close similarity scores.
So i am using below approach to compare the images.
I am using Shi-Tomasi algorithm based function in opencv ie. cv2.goodFeaturesToTrack() to find the corners(feature points) in an image. After finding corners i want to classify them in 4 categories and compare them in two images. Below is corner categories defined as of now which my vary because of huge variations in corner types(angle, no. of lines crossing at corners, irregular pixel variation at corner point):
Corner categories:
Type-1: L-shaped
Type-2: Line intersection
Type-3: Line-curve intersection
type-4: Curve-Curve intersectio
I am trying to solve this using below approach:
=> Take a patch of fixed window size surrounding the corner pixel say a window of 32X32
=> Find the gradient information ie. gradient magnitude and its direction in this window and use this information to classify the corner in above 4 classes.After going through image classification i came to know that Using HOG algorithm image gradient information can be converted to feature vectors.
=> HOG feature vector calculated in above step can be used to train SVM to get a model.
=> This model can be used for new feature point classification.
After implementing above algorithm i am getting poor accuracy.
If there is any other way to classify the corners please suggest.
Related
When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF
I have some conceptual issues in understanding SURF and SIFT algorithm All about SURF. As far as my understanding goes SURF finds Laplacian of Gaussians and SIFT operates on difference of Gaussians. It then constructs a 64-variable vector around it to extract the features. I have applied this CODE.
(Q1 ) So, what forms the features?
(Q2) We initialize the algorithm using SurfFeatureDetector detector(500). So, does this means that the size of the feature space is 500?
(Q3) The output of SURF Good_Matches gives matches between Keypoint1 and Keypoint2 and by tuning the number of matches we can conclude that if the object has been found/detected or not. What is meant by KeyPoints ? Do these store the features ?
(Q4) I need to do object recognition application. In the code, it appears that the algorithm can recognize the book. So, it can be applied for object recognition. I was under the impression that SURF can be used to differentiate objects based on color and shape. But, SURF and SIFT find the corner edge detection, so there is no point in using color images as training samples since they will be converted to gray scale. There is no option of using colors or HSV in these algorithms, unless I compute the keypoints for each channel separately, which is a different area of research (Evaluating Color Descriptors for Object and Scene Recognition).
So, how can I detect and recognize objects based on their color, shape? I think I can use SURF for differentiating objects based on their shape. Say, for instance I have a 2 books and a bottle. I need to only recognize a single book out of the entire objects. But, as soon as there are other similar shaped objects in the scene, SURF gives lots of false positives. I shall appreciate suggestions on what methods to apply for my application.
The local maxima (response of the DoG which is greater (smaller) than responses of the neighbour pixels about the point, upper and lover image in pyramid -- 3x3x3 neighbourhood) forms the coordinates of the feature (circle) center. The radius of the circle is level of the pyramid.
It is Hessian threshold. It means that you would take only maximas (see 1) with values bigger than threshold. Bigger threshold lead to the less number of features, but stability of features is better and visa versa.
Keypoint == feature. In OpenCV Keypoint is the structure to store features.
No, SURF is good for comparison of the textured objects but not for shape and color. For the shape I recommend to use MSER (but not OpenCV one), Canny edge detector, not local features. This presentation might be useful
I have the following binary image:
In this binary image, there is a quadrilateral. However, due to poor pre-processing, there are some intervals in the contour of the quadrilateral, and in other words the quadrilateral is not closed. Then my question is how can I obtain the corners of the quadrilateral. If I can obtain the corners, I can perform projective correction on the original image and obtain a image with geometric distortion. Any ideas will be appreciated. The illustration image can be downloaded from https://dl.dropboxusercontent.com/u/92688392/blob.jpg.
The best way to do it is to
first detect the different lines in your image (using for example
Hough transform),
keep the 4 ones that make up your quadrilateral
(by geometric reasoning),
finally compute their intersections.
This is also a classical way to implement chessboard corner detection in camera calibration routines by the way.
Why do I claim that it is the best? Because it can handle missing corners and noise on the corner positions, since fitting a line will "invent" missing points and average the error among the detected edge pixels.
There is also some "old" (from the 80's or 90's) NASA-funded benchmark for camera calibration that backs up this claim, but it is very hard to find online (I was lucky enough to read a hard copy in one of my previous jobs).
in my project I want to crop the ROI of an image. For this I create a map with the regions of interesst. Now I want to crop the area which has the most important pixels (black is not important, white is important).
Has someone an idea how to realize it? I think this is a maximazion problem
The red border in the image below is an example how I want to crop this image
If I understood your question correctly, you have computed a value at every point in the image. These values suggests the "importance"/"interestingness"/"saliency" of each point. The matrix/image containing these values is the "map" you are referring to. Your goal is to get the bounding box for regions of interests (ROI) with high "importance" score.
The way I think you can go about segmenting the ROIs is to apply Graph Cut based segmentation computing a "score" at each pixel using your importance map. The result of the segmentation is a binary mask that masks the "important" pixels. Next, run OpenCV's findcontours function on this binary mask to get the individual connected components. Then use OpenCV's boundingRect function on the contours returned by findContours(...) to get the bounding boxes.
The good thing about using a Graph Cut based segmentation algorithm in this way is that it will join up fragmented components i.e. the resulting binary mask will tend not to have pockets of small holes even if your "importance" map is noisy.
One Graph Cut based segmentation algorithm already implemented in OpenCV is the GrabCut algorithm. A quick hack would be to apply it on your "importance" map to get the binary mask I mentioned above. A more sophisticated approach would be to build the foreground and background (color perhaps?) model using your "importance" map and passing it as input to the function. More details on GrabCut in OpenCV can be found here: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html?highlight=grabcut#void grabCut(InputArray img, InputOutputArray mask, Rect rect, InputOutputArray bgdModel, InputOutputArray fgdModel, int iterCount, int mode)
If you would like greater flexibility, you can hack your own graphcut based segmentation algorithm using the following MRF library. This library allows you to specify your custom objective function in computing the graph cut: http://vision.middlebury.edu/MRF/code/
To use the MRF library, you will need to specify the "cost" at each point in your image indicating whether that point is "foreground" or "background". You can also think of this dichotomy as "important" or "not important" instead of "foreground" vs "background".
The MRF library's goal is to return you a label at each point such that total cost of assigning those labels is as small as possible. Hence, the game is to come up with a function to compute a small cost for points you consider important and large otherwise.
Specifically, the cost at each point is composed of 2 parts: 1) The data term/function and 2) The smoothness term/function. As mentioned earlier, the smaller the data term at each point, the more likely that point will be selected. If your "importance" score s_ij is in the range [0, 1], then a common way to compute your data term would be -log(s_ij).
The smoothness terms is a way to suggest whether 2 neighboring pixels p, q, should have the same label i.e. both "foreground", "background", or one "foreground" and the other "background". Similar to the data cost, you have to construct it such that the cost is small for neighbor pixels having similar "importance" score so that they will be assigned the same label. This term is responsible for "smoothing" the resulting mask so that you will not have pixels of low "importance" sprinkled within regions of high "importance" and vice versa. If there are such regions, OpenCV's findContours(...) function mentioned above will return contours for these regions, which can be filtered out perhaps by checking their size.
Details on functions to compute the cost can be found in the GrabCut paper: GrabCut
This blog post provides a bit more detail (and code) on creating your own graphcut segmentation algorithm in OpenCV: http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/
Another paper showing how to perform graph cut segmentation on grayscale images (your case), with better notations, and without the complicated image matting part (not implemented in OpenCV's version) in the GrabCut paper is this: Graph Cuts and Efficient N-D Image Segmentation
Hope this helps.
I'm trying to implement user-assisted edge detection using OpenCV.
Assume you have an image in which we need to find a polygonal shape. For the sake of discussion, let's say we need to find the top of a rectangular table in a picture. The user will click on the four corners of the table to help us narrow things down. Connecting those four points gives us a polygon, or four vectors.
But the user is not very accurate when clicking on those corners. So I'd like to use edge information from the image to increase the accuracy.
I'm using a Canny edge detector with a fairly high treshold to determine important edges in my image. (more precisely, I'm scaling down, blurring, converting to grayscale, then run Canny). How can I compute whether a vector aligns with an edge in my image? If I have a way to compute "alignment", my overal algorithm comes down to perturbating the location of the four edge points, computing the total "alignment" of my polygon with the edges in the image, until I find an optimum.
What is a good way to define and compute this "alignment" metric?
You may want to try to use FindContours to detect your table or any other contour. Then build a contour also from the user input points. After this you can read about Contour Moments by which you can compare contours. You can compare all the contours from the image with the one built from the user points and then select the closest match.