I'd like to know what would be the best strategy to compare a group of contours, in fact are edges resulting of a canny edges detection, from two pictures, in order to know which pair is more alike.
I have this image:
http://i55.tinypic.com/10fe1y8.jpg
And I would like to know how can I calculate which one of these fits best to it:
http://i56.tinypic.com/zmxd13.jpg
(it should be the one on the right)
Is there anyway to compare the contours as a whole?
I can easily rotate the images but I don't know what functions to use in order to calculate that the reference image on the right is the best fit.
Here it is what I've already tried using opencv:
matchShapes function - I tried this function using 2 gray scales images and I always get the same result in every comparison image and the value seems wrong as it is 0,0002.
So what I realized about matchShapes, but I'm not sure it's the correct assumption, is that the function works with pairs of contours and not full images. Now this is a problem because although I have the contours of the images I want to compare, they are hundreds and I don't know which ones should be "paired up".
So I also tried to compare all the contours of the first image against the other two with a for iteration but I might be comparing,for example, the contour of the 5 against the circle contour of the two reference images and not the 2 contour.
Also tried simple cv::compare function and matchTemplate, none with success.
Well, for this you have a couple of options depending on how robust you need your approach to be.
Simple Solutions (with assumptions):
For these methods, I'm assuming your the images you supplied are what you are working with (i.e., the objects are already segmented and approximately the same scale. Also, you will need to correct the rotation (at least in a coarse manner). You might do something like iteratively rotate the comparison image every 10, 30, 60, or 90 degrees, or whatever coarseness you feel you can get away with.
For example,
for(degrees = 10; degrees < 360; degrees += 10)
coinRot = rotate(compareCoin, degrees)
// you could also try Cosine Similarity, or even matchedTemplate here.
metric = SAD(coinRot, targetCoin)
if(metric > bestMetric)
bestMetric = metric
coinRotation = degrees
Sum of Absolute Differences (SAD): This will allow you to quickly compare the images once you have determined an approximate rotation angle.
Cosine Similarity: This operates a bit differently by treating the image as a 1D vector, and then computes the the high-dimensional angle between the two vectors. The better the match the smaller the angle will be.
Complex Solutions (possibly more robust):
These solutions will be more complex to implement, but will probably yield more robust classifications.
Haussdorf Distance: This answer will give you an introduction on using this method. This solution will probably also need the rotation correction to work properly.
Fourier-Mellin Transform: This method is an extension of Phase Correlation, which can extract the rotation, scale, and translation (RST) transform between two images.
Feature Detection and Extraction: This method involves detecting "robust" (i.e., scale and/or rotation invariant) features in the image and comparing them against a set of target features with RANSAC, LMedS, or simple least squares. OpenCV has a couple of samples using this technique in matcher_simple.cpp and matching_to_many_images.cpp. NOTE: With this method you will probably not want to binarize the image, so there are more detectable features available.
Related
I'm trying to extract the geometries of the papers in the image below, but I'm having some trouble with grabbing the contours. I don't know which threshold algorithm to use (here I used static threshold = 10, which is probably not ideal.
And as you can see, I can get the correct number of images, but I can't get the proper bounds using this method.
Simply applying Otsu just doesn't work, it doesn't capture the geometries.
I assume I need to apply some edge detection, but I'm not sure what to do once I apply Canny or some other.
I also tried sobel in both directions (+ve and -ve in x and y), but unsure how to extract these contours from there.
How do I grab these contours?
Below is some previews of the images in the process of the final convex hull results.
**Original Image** **Sharpened**
**Dilate,Sharpen,Erode,Sharpen** **Convex Of Approximated Polygons Hulls (which doesn't fully capture desired regions)**
Sorry in advance about the horrible formatting, I have no idea how to make images smaller or title them nicely in SOF
Suppose that the Canny edge detector successfully detects an edge in an image. The edge is then rotated by θ, where the relationship between a point on the original edge (x,y)(x,y) and a point on the rotated edge (x′,y′)(x′,y′) is defined as x′ = xcosθ; y′ = xsinθ;
Will the rotated edge be detected using the same Canny edge detector?
(I think we should find answer considering that the detection of an edge by the Canny edge detector depends only on the magnitude of its derivative.)
The answer is both yes and no, and which one you go for depends on how literally you take the question.
First of all, we're dealing with a rectangular grid, so given an integer location (x,y), the corresponding point (x',y') in a rotated image is highly likely not an integer location. And considering that the output of Canny is a set of points, and not a smooth function that can be interpolated, it would be difficult to establish a correspondence between the set resulting from the rotated and the one resulting from the original image.
Think for example about the number of pixels on a discrete line of a given length at 0 degrees and at 45 degrees. (Hint: the line at 45 degrees has sqrt(2) times fewer pixels.)
But if you take the question more generally and interpret it as "will an edge that is detected in the original image also be detected after rotating the image by θ degrees?" then the answer is yes, in theory.
Of course practice is always a bit different than theory. The details of the implementation matter here. And there is always numerical imprecision to contend with.
Let's start by assuming the rotation is computed correctly, with a precise interpolation scheme (cubic, Lanczos) and not rounded after to uint8 or something (i.e. we're computing using floating-point values).
If you read the original paper by Canny, you'll see he proposes using Gaussian derivatives as the best compromise between compact support and computational precision. I have seen few implementations that actually do. Typically I see a convolution with a Gaussian and then Sobel derivatives. Especially for smaller sigmas (less smoothing) the difference can be quite large. Gaussian derivatives are rotationally invariant, Sobel derivatives are not.
The next step in the algorithm is non-maximum suppression. This is where the continuous gradient is converted to a set of points. For each pixel, it checks to see if it is a local maximum in the direction of the gradient. Because this is done per pixel, a different set of locations are tested in the rotated image compared to the original. Nonetheless, it should detect points along the same ridges in both cases.
Next, a hysteresis threshold is applied. This is a two-threshold operation that keeps pixels above one threshold as long as at least one pixel above a second threshold is present in the same connected component. This is where the differences could occur between rotated and original image. Remember we're dealing with a set of pixels. We have samples the continuous gradient function at discrete points. There could be an edge that has one pixel above the second threshold in one version of the image, but not in the other. This would only occur for edges very close to the chosen threshold, of course.
Next comes a thinning. Because the non-maximum suppression can yield points along a thicker line, a thinning operation is applied that removes pixels from the set that are not needed to maintain connectivity of the lines. Which pixels are selected here will also differ between rotated and original images, but this does not change the geometry of the solution, so we still have the same set of points.
So, the answer is yes and no. :)
Note that the same logic applies to translation.
I am doing a project in opencv to detect handwritten characters from a user filled form. I have made algorithm to detect the skew angle of the scanned image using Hough Line Transform. But it does not work when the image is 180 degree rotated since 0 and 180 degree are treated as same by Hough Line function. My image contains some rectangles to fill data in them and some text. So how do i detect if a scanned image is 180 degree rotated or not?
Since I will have to first correct the skew angle of the image then only I can detect exactly where on the image user filled data (which I need to extract) lies using rectangle coordinates from the empty template form provided earlier, answers without using chacater recognition are appreciated.
To lift the 180° degrees ambiguity, only OCR can tell you: perform two reads on the deskewed text, one using the given angle, the other one using the angle + 180°, and keep the most successful read.
Unless you have some a priori information it's the only way, as other image processing operations don't know about characters.
UPDATE:
Some strings are forever ambiguous, like 0689HINOSXZ <=> ZXSONIH6890.
If the layout of the text is known (boxes) and asymmetric, it is a relatively easy matter to check matching of the text strings to the layout: choose a box (such as the topmost) and a string (the topmost), and align them by translation; then see how the other boxes and strings match (using a nearest neighbor rule) and establish the correspondences. Compare results with the straight and flipped layout, and keep the best overall area of overlap.
For reliability, it can be better to try more than a starting box/string pair, as there can be some ambiguity to which is the topmost (it could even be missing).
Isn't your problem more general? Let's say, you detect a skew angle of +45 degrees and rotate the image by -45 degrees. Then it could still be that the image is rotated by 180 degrees because it was not rotated +45 degrees but -135 instead.
Anyway, to the actual question: I am not an expert in character recognition but I think if you use it anyway in your application, couldn't you just try character recognition for both rotations and then choose the one that gets stronger response?
If you match the rectangles in your template with those of the skew corrected image, you'll be able to get the correct orientation (but only if there's no symmetry in the placement of those rectangles). For matching you may be able to use the rectangles in your template as a mask to extract regions from skew corrected image.
EDIT
Suppose your template and the skew corrected image look like this (in the best case where there are no displacements in skew corrected) :
Then you can use the template as a mask to copy data from skew corrected image. Then check what fraction of the white pixels in the template is contained in the copied image. This value will be very low for a 180 degree rotated image.
But as you say, this won't work in practice because of the displacements. Then may be you can try template matching (cross correlation) in which you use the template image as the template. Location of the strongest peak and the strength would give you some indication of the orientation. You can perform template matching at a reduced resolution so it runs faster.
You could try to match keypoints (Harris, Sift, ...) from the scanned image and the empty template. With the matched points you can easily find a transformation to align the scanned image with the template. This may work for your case, but you are more likely to succeed if the are some textured logos in the images, as it's usually the case for forms.
Can't you simple compute two cross-correlations? One with 180 rotation and one without? The one with the matching rectangle should give you a higher correlation maximum (provided the image contrast of the remaining page is not too misleading, but some pre-filtering could help here.)
I am trying to find contour of a image, before that I am applying Canny's edge detector.
It's giving different result for different images.For one image it's giving perfect contours at threshold value - min-40 max-240 and for other image its 30-120.
I want to make it generic.
In laymen terms, edge detection needs a threshold to tell what difference/change should be counted as edge. For details read here.
So, the edges depend on the the content of image ie the level of brightness/darkness/contrast.
I suggest you to simply find the mean of whole gray image and take threshold as follows:
min_threshold = 0.66 * mean
max_threshold = 1.33 * mean
I have tested it and it gives impressive result. You can use median instead of mean, with almost same result. Another alternative is to first equalize the image and then try threshold of your choice/experimental.
But again again strongly recommend to try mean method. In case of any query, write here.
Happy Coding :)
What is Distance Transform?What is the theory behind it?if I have 2 similar images but in different positions, how does distance transform help in overlapping them?The results that distance transform function produce are like divided in the middle-is it to find the center of one image so that the other is overlapped just half way?I have looked into the documentation of opencv but it's still not clear.
Look at the picture below (you may want to increase you monitor brightness to see it better). The pictures shows the distance from the red contour depicted with pixel intensities, so in the middle of the image where the distance is maximum the intensities are highest. This is a manifestation of the distance transform. Here is an immediate application - a green shape is a so-called active contour or snake that moves according to the gradient of distances from the contour (and also follows some other constraints) curls around the red outline. Thus one application of distance transform is shape processing.
Another application is text recognition - one of the powerful cues for text is a stable width of a stroke. The distance transform run on segmented text can confirm this. A corresponding method is called stroke width transform (SWT)
As for aligning two rotated shapes, I am not sure how you can use DT. You can find a center of a shape to rotate the shape but you can also rotate it about any point as well. The difference will be just in translation which is irrelevant if you run matchTemplate to match them in correct orientation.
Perhaps if you upload your images it will be more clear what to do. In general you can match them as a whole or by features (which is more robust to various deformations or perspective distortions) or even using outlines/silhouettes if they there are only a few features. Finally you can figure out the orientation of your object (if it has a dominant orientation) by running PCA or fitting an ellipse (as rotated rectangle).
cv::RotatedRect rect = cv::fitEllipse(points2D);
float angle_to_rotate = rect.angle;
The distance transform is an operation that works on a single binary image that fundamentally seeks to measure a value from every empty point (zero pixel) to the nearest boundary point (non-zero pixel).
An example is provided here and here.
The measurement can be based on various definitions, calculated discretely or precisely: e.g. Euclidean, Manhattan, or Chessboard. Indeed, the parameters in the OpenCV implementation allow some of these, and control their accuracy via the mask size.
The function can return the output measurement image (floating point) - as well as a labelled connected components image (a Voronoi diagram). There is an example of it in operation here.
I see from another question you have asked recently you are looking to register two images together. I don't think the distance transform is really what you are looking for here. If you are looking to align a set of points I would instead suggest you look at techniques like Procrustes, Iterative Closest Point, or Ransac.