Is canny edge detection edge rotationlly invariant? - machine-learning

Suppose that the Canny edge detector successfully detects an edge in an image. The edge is then rotated by θ, where the relationship between a point on the original edge (x,y)(x,y) and a point on the rotated edge (x′,y′)(x′,y′) is defined as x′ = xcosθ; y′ = xsinθ;
Will the rotated edge be detected using the same Canny edge detector?
(I think we should find answer considering that the detection of an edge by the Canny edge detector depends only on the magnitude of its derivative.)

The answer is both yes and no, and which one you go for depends on how literally you take the question.
First of all, we're dealing with a rectangular grid, so given an integer location (x,y), the corresponding point (x',y') in a rotated image is highly likely not an integer location. And considering that the output of Canny is a set of points, and not a smooth function that can be interpolated, it would be difficult to establish a correspondence between the set resulting from the rotated and the one resulting from the original image.
Think for example about the number of pixels on a discrete line of a given length at 0 degrees and at 45 degrees. (Hint: the line at 45 degrees has sqrt(2) times fewer pixels.)
But if you take the question more generally and interpret it as "will an edge that is detected in the original image also be detected after rotating the image by θ degrees?" then the answer is yes, in theory.
Of course practice is always a bit different than theory. The details of the implementation matter here. And there is always numerical imprecision to contend with.
Let's start by assuming the rotation is computed correctly, with a precise interpolation scheme (cubic, Lanczos) and not rounded after to uint8 or something (i.e. we're computing using floating-point values).
If you read the original paper by Canny, you'll see he proposes using Gaussian derivatives as the best compromise between compact support and computational precision. I have seen few implementations that actually do. Typically I see a convolution with a Gaussian and then Sobel derivatives. Especially for smaller sigmas (less smoothing) the difference can be quite large. Gaussian derivatives are rotationally invariant, Sobel derivatives are not.
The next step in the algorithm is non-maximum suppression. This is where the continuous gradient is converted to a set of points. For each pixel, it checks to see if it is a local maximum in the direction of the gradient. Because this is done per pixel, a different set of locations are tested in the rotated image compared to the original. Nonetheless, it should detect points along the same ridges in both cases.
Next, a hysteresis threshold is applied. This is a two-threshold operation that keeps pixels above one threshold as long as at least one pixel above a second threshold is present in the same connected component. This is where the differences could occur between rotated and original image. Remember we're dealing with a set of pixels. We have samples the continuous gradient function at discrete points. There could be an edge that has one pixel above the second threshold in one version of the image, but not in the other. This would only occur for edges very close to the chosen threshold, of course.
Next comes a thinning. Because the non-maximum suppression can yield points along a thicker line, a thinning operation is applied that removes pixels from the set that are not needed to maintain connectivity of the lines. Which pixels are selected here will also differ between rotated and original images, but this does not change the geometry of the solution, so we still have the same set of points.
So, the answer is yes and no. :)
Note that the same logic applies to translation.

Related

Image Processing(oblique image)

I hope you can give me some suggestions. I want to semantically segment the cyanobacteria image of the lake, and hope to calculate the cyanobacteria area in the image. How to preprocess the image due to the existence of a certain angle?It is not vertical. Make it more accurate to calculate the actual area through pixels. The image is as follows.
You can't make calibrated measurements (in true units of area) without knowing the scaling factors. So you should let a calibration target float on the water, wholly in the field of view*.
If the viewing distance is sufficiently large that the perspective effect can be neglected, the transformation is affine and it suffices to take the ratio of the apparent area of the cyanobacteria (in pixels) over the apparent area of the target (in pixels), times the true area of the target**.
If the perspective is strong, the transformation is an homography, and things get a little more complicated. From four points of the target (say corners), you can obtain the coefficients of the homography that maps the viewed points to undistorded space. Then you need to undistort the cyanobacteria area outline (as a polygon) and you can compute its area by the shoelace formula.
You can also completely straighten the image before segmentation, though this is not really necessary.
*You could think of obtaining the scaling factors by knowing viewing angles and distance, but that method will be unpractical to use in the field.
**Take a picture of a large square. If it appears like a parallelogram, you are good. If like a general quadrilateral, perspective must be corrected.

Finding edges in a height map

I want to find sharp edges in a heightmap image, while ignoring shallow edges.
OpenCV offers multiple approaches to finding edges in a 2d Image: Canny, Sobel, etc.
However, all these approaches work by comparing the intensity values on both sides of the edge.
If the 2D Image represents a height map of a 3D object, then this results in some weird behaviour.
In a height map, the height of a 3D object at a given X/Y coordinate is represented as the intensity of the 2D Pixel at that X/Y coordinate:
In the above picture, at the edge B the intensity changes only slightly between the left and right side, even though it is a sharp corner.
At the edge A, there is a bigchange in the intensity between pixels on the left side of the edge and the right, even though it is only a shallow angle.
So there is no threshold for Canny or Sobel that will preserve the sharp edge but filter the shallow edge.
(In the above example, the edge B has one side with an ascending slope, and one side with a descending slope. I could filter for this feature; but that would remove the edges C and D as well)
How can I get a binary edge image, containing only edges above a certain angle? (e.g. edge B, C, and D, but not A)
Or alternatively, how can I get a gradient derivative image, where the intensity of each pixel is proportional to the angle of the edge at that pixel?
Probably you'll want to use second derivative instead of first for this task.
Here's my intuition: taking derivative of height (intensity in your case) at each position on an evenly spaced grid would be proportional to arctan of the surface slope between sampling points (or at sampling points if you use a 2-sided derivative approximation). But since you want to detect sharp edges - you are looking for a derivative of slope at the sampling points. This means that you can set a threshold on a derivative of arctan of derivative of intensity to achieve your goal (luckily there's no "need to go deeper" :) )
You will have to be extra careful with taking a derivative of "slope angles" that you'll get - depending on the coordinate system you may come across ambiguity of angle difference (there are 2 ways to get from one angle to another, which are different in general case; you're looking for the "shorter" one). You can look for possible solution here
I have a rather simple approach that I came across wile reading a blog post.
It involves computing the median value of the gray scale image. Using this value we can now set two threshold values:
lower: max(0, (1.0 - 0.33) * v)
upper: min(255, (1.0 + 0.33) * v)
Now pass these two values as parameters into the cv2.Canny() function.
You will now be able to perform an optimized edge detection given any image. The crux of this answer depends on the median value of the image which varies for different images.
If i understand your question correctly, "what you need is basically a corner with high intensity values".
If that is so then look for Harris corner detector which would help you to find points with high gradient change in both direction.
http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html
Once you detect the corners you can filter the corners which have high intensity by using a suitable threshold.

Understanding Distance Transform in OpenCV

What is Distance Transform?What is the theory behind it?if I have 2 similar images but in different positions, how does distance transform help in overlapping them?The results that distance transform function produce are like divided in the middle-is it to find the center of one image so that the other is overlapped just half way?I have looked into the documentation of opencv but it's still not clear.
Look at the picture below (you may want to increase you monitor brightness to see it better). The pictures shows the distance from the red contour depicted with pixel intensities, so in the middle of the image where the distance is maximum the intensities are highest. This is a manifestation of the distance transform. Here is an immediate application - a green shape is a so-called active contour or snake that moves according to the gradient of distances from the contour (and also follows some other constraints) curls around the red outline. Thus one application of distance transform is shape processing.
Another application is text recognition - one of the powerful cues for text is a stable width of a stroke. The distance transform run on segmented text can confirm this. A corresponding method is called stroke width transform (SWT)
As for aligning two rotated shapes, I am not sure how you can use DT. You can find a center of a shape to rotate the shape but you can also rotate it about any point as well. The difference will be just in translation which is irrelevant if you run matchTemplate to match them in correct orientation.
Perhaps if you upload your images it will be more clear what to do. In general you can match them as a whole or by features (which is more robust to various deformations or perspective distortions) or even using outlines/silhouettes if they there are only a few features. Finally you can figure out the orientation of your object (if it has a dominant orientation) by running PCA or fitting an ellipse (as rotated rectangle).
cv::RotatedRect rect = cv::fitEllipse(points2D);
float angle_to_rotate = rect.angle;
The distance transform is an operation that works on a single binary image that fundamentally seeks to measure a value from every empty point (zero pixel) to the nearest boundary point (non-zero pixel).
An example is provided here and here.
The measurement can be based on various definitions, calculated discretely or precisely: e.g. Euclidean, Manhattan, or Chessboard. Indeed, the parameters in the OpenCV implementation allow some of these, and control their accuracy via the mask size.
The function can return the output measurement image (floating point) - as well as a labelled connected components image (a Voronoi diagram). There is an example of it in operation here.
I see from another question you have asked recently you are looking to register two images together. I don't think the distance transform is really what you are looking for here. If you are looking to align a set of points I would instead suggest you look at techniques like Procrustes, Iterative Closest Point, or Ransac.

opencv SimpleBlobDetector filterByInertia meaning?

I don't understand what filterByInertia means... neither do I understand the documentation's little description :
By ratio of the minimum inertia to maximum inertia. Extracted blobs will have this ratio between minInertiaRatio (inclusive) and maxInertiaRatio (exclusive).
. The above image pretty much explains what the different filter parameters do. SimpleBlobDetector is happiest when it sees a circular blob, and different filters filter out different kids of deviations from the circular shape.
Inertia measures the the ratio of the minor and major axes of a blob.
The figure also shows the difference between circularity and inertia. I have copied this figure from Blob Detection Tutorial at LearnOpenCV.com
I've been wondering this for a while also; the OpenCV documentation isn't very helpful when it comes to blob detection.
Based on the descriptions of other blob analyzers, the inertia of a blob is "the inertial resistance of the blob to rotation about its principal axes". It depends on how the mass of the blob (I guess in this case the area) is distributed throughout the blob's shape.
There's a lot of mathy stuff involved -- most of which I don't remember how to do -- but the result at the bottom of this page on the properties of binary images sums it up fairly well (blob detection is done by converting the input image to a series of binary images):
The ratio gives us some idea of how rounded the object is. This ratio will be 0 for a line and 1 for a circle.
So basically, by specifying minInertiaRatio and maxInertiaRatio you can filter the blobs based on how elongated they are. An inertia ratio of 0 will yield elongated blobs (closer to lines) and an inertia ratio of 1 will yield blobs where the area is more concentrated toward the center (closer to circles).
Here's a physical intepretation:
If you cut the blob out on a piece of card, you could find its center of gravity, and then attach an axle to it, crossing this point (the axle being parallel to the card), and then spin it, and measure its moment of inertia. Depending on the shape, you may get different values according to how you place the axle. For an ellipse, you get the lowest value when the axle is attached along the long (major) axis, and the largest when the the axle is placed along the short axis (so that more of the card is far from the axle). For a circle the inertia is always the same, of course.
If there are different values, there will be always be a 'max' inertia at some orientation, and a 'min' with the axle placed 90 degrees away from the 'max'. The inertia ratio is simply the ratio between these intertias, min/max.
For shapes which are not ellipses, the metric tells you whether the overall shape is roughly elongated, or roughly the same size in all directions; without caring in particular about an uneven boundary or cuts and concavities (which roundness and convexity look at).
Mathematically, it does something like this:
Consider the set of points within the blob to be a population of (x,y) samples
Find the mean of these, and the covariance matrix x vs. y
Find the two eigenvalues of the covariance matrix (which are the same as its singular values, due to the nature of this matrix)
The inertia ratio is the ratio between these two values, smallest/largest.

What's the use of Canny before HoughLines (opencv)?

I'm new to image processing and I'm working on detecting lines in a document image. I read the theory of Hough line transform but I can't see why I must use Canny before calling that function in opencv like being said in many tutorials. What's the point of finding edges in this case? The fact is that if I don't use Canny or threshold before HoughLines() the results will be very messy. I hope someone will explain for me the reason why.
2 of the tutorials I've read:
Imgproc Feature Detection
Hough Line Transform
Short Answer
cvCanny is used to detect Edges, as well as increase contrast and remove image noise.
HoughLines which uses the Hough Transform is used to determine whether those edges are lines or not. Hough Transform requires edges to be detected well in order to be efficient and provide meaning results.
Long Answer
The Limitations of the Hough Transform are described in more detail on Wikipedia.
The efficiency of the Hough Transform relies of the bin of acculumated pixel being distinct, e.g. a direct contrast between a pixel and its surrounding neighbours or if using a mask region a pixel region and its surrounds regions. If all pixels had similar acculumated values nothing would stand out as a line or circle. This leads to the reduction of colour (colour to grayscale, grayscale to black and white) in order to increase contract.
The number of parameters to the Hough Transform also increase the spread of votes in the pixel bins and increase the complexity of the transform, which mean that normally only lines or circles are reliably detected using it as they have less than 3 parameters.
The edges need to be detected well before running the Hough Transform otherwise its efficiency suffers further. Also noisy images don't work well with Hough transform unless the noise is removed before hand.
First of all, to detect lines you need to work on a boolean matrix image (or binary), I mean: the color is black or white, there's no grayscale.
HoughLines()'s requirement to work properly is to have this kind of image as input. That's the reason you have to use Canny or Treshold, to convert the colored image matrix into a boolean one.
Hough transformation
A line in one picture is actually an edge. Hough transform scans the whole image and using a transformation that converts all white pixel cartesian coordinates in polar coordinates; the black pixels are left out. So you won't be able to get a line if you first don't detect edges, because HoughLines() don't know how to behave when there's a grayscale.
Theoretically, you are correct. Finding edges is not absolutely required for the Hough Line algorithm to work.
The way the Hough works is basically it takes every point and connects it to every other point, and whatever points have the most lines going through them, those lines stay. For this, we need points. The Canny creates those points. Theoretically you could use any sort of filter - isolate all blue or purple points and connect them, whatever - but edges works well.
The Hough also does not weight its lines or points. To the Hough, an image is binary - made up of either 1s or 0, points or not points. There is no need for greyscale, and the canny conveniently returns binary images.
Thus is the Canny always part of the Hough.
all is about processing binary data,
complex data -> (a binary data, b binary data, c binary data, ..) (using canny(),sobel(), etc)
a binary data -> function1() (using houghlines())
b binary data -> function2()
c binary data -> function3() ..
a binary data -X-> function2() ..
complex data -X-> function1() ..
HTH

Resources