It's just a simple thing that I need to clarify.
I need a little refresh in mathematics:
In a circle the length of the gradient should be the radius?
Or do we use the gradient only to get the orientation?
I got to this question after I read about gradient in image processing:
I've read this answer and this about how to get the image gradient and of course here.
I don't understand if the magnitude should stand for the number of pixels? or it just stand for the strength of the intensity changes in a specific point.
The following image is the magnitude of the gradient:
the magnitude of the gradient:
I run the code and watched the magnitude in numbers, and the numbers clearly are not in the range of the image width\height.
Me, waiting to a simple clarify.
Thanks!
Mathematically speaking, the gradient magnitude, or in other words the norm of the gradient vector, represents the derivative (i.e. the slope) of a 2D signal. This is quite clear in the definition given by Wikipedia:
Here, f is the 2D signal and x^, y^ (this is ugly, I'll note them ux and uy in the following) are respectively unit vectors in the horizontal and vertical direction.
In the context of images, the 2D signal (i.e. the image) is discrete instead of being continuous, hence the derivative is approximated by the difference between the intensity of the current pixel and the intensity of the previous pixel, in the considered direction (actually, there are several ways to approximate the derivative, but let's keep it simple). Hence, we can approximate the gradient by the following quantity:
gradient f (u,v) = [ f(u,v)-f(u-1,v) ] . ux + [ f(u,v)-f(u,v-1) ] . uy
In this case, the gradient magnitude is the following:
|| gradient f (u,v) || = square_root { [ f(u,v)-f(u-1,v) ]² + [ f(u,v)-f(u,v-1) ]² }
To summarize, the gradient magnitude is a measure of the local intensity change at a given point and has not much to do with a radius, nor the width/height of the image.
Related
I want to find sharp edges in a heightmap image, while ignoring shallow edges.
OpenCV offers multiple approaches to finding edges in a 2d Image: Canny, Sobel, etc.
However, all these approaches work by comparing the intensity values on both sides of the edge.
If the 2D Image represents a height map of a 3D object, then this results in some weird behaviour.
In a height map, the height of a 3D object at a given X/Y coordinate is represented as the intensity of the 2D Pixel at that X/Y coordinate:
In the above picture, at the edge B the intensity changes only slightly between the left and right side, even though it is a sharp corner.
At the edge A, there is a bigchange in the intensity between pixels on the left side of the edge and the right, even though it is only a shallow angle.
So there is no threshold for Canny or Sobel that will preserve the sharp edge but filter the shallow edge.
(In the above example, the edge B has one side with an ascending slope, and one side with a descending slope. I could filter for this feature; but that would remove the edges C and D as well)
How can I get a binary edge image, containing only edges above a certain angle? (e.g. edge B, C, and D, but not A)
Or alternatively, how can I get a gradient derivative image, where the intensity of each pixel is proportional to the angle of the edge at that pixel?
Probably you'll want to use second derivative instead of first for this task.
Here's my intuition: taking derivative of height (intensity in your case) at each position on an evenly spaced grid would be proportional to arctan of the surface slope between sampling points (or at sampling points if you use a 2-sided derivative approximation). But since you want to detect sharp edges - you are looking for a derivative of slope at the sampling points. This means that you can set a threshold on a derivative of arctan of derivative of intensity to achieve your goal (luckily there's no "need to go deeper" :) )
You will have to be extra careful with taking a derivative of "slope angles" that you'll get - depending on the coordinate system you may come across ambiguity of angle difference (there are 2 ways to get from one angle to another, which are different in general case; you're looking for the "shorter" one). You can look for possible solution here
I have a rather simple approach that I came across wile reading a blog post.
It involves computing the median value of the gray scale image. Using this value we can now set two threshold values:
lower: max(0, (1.0 - 0.33) * v)
upper: min(255, (1.0 + 0.33) * v)
Now pass these two values as parameters into the cv2.Canny() function.
You will now be able to perform an optimized edge detection given any image. The crux of this answer depends on the median value of the image which varies for different images.
If i understand your question correctly, "what you need is basically a corner with high intensity values".
If that is so then look for Harris corner detector which would help you to find points with high gradient change in both direction.
http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html
Once you detect the corners you can filter the corners which have high intensity by using a suitable threshold.
I am trying to implement a image matching algorithm based on gradient orientation matching. The main algo contains from the following steps:
convert the image to polar coordinates:
calculate gradients using sobel operator:
Xgrad = cv2.Sobel(gr,cv2.CV_64F,1,0,ksize=5)
Ygrad = cv2.Sobel(gr,cv2.CV_64F,0,1,ksize=3)
3) calculate orientation of gradient and binarize it. :
Now I can compare images using last features map with ignorance to rotation ans smal changes.
But what I have found that this algo detects rotations of the same image with rotation rather purely. I have build a test image with circles to test this algo:
and rotated it to 10 grads
here are polar conversions:
and gradient orientation masks: as You se here is already a lot of noise on gradient matrix. and it brokes matching algo.
and it's best passing difference mask: the whole line areas are marked as not matched. Small gaussian blurring on different steps are not helping at all. I dont know why.
Update:
Gradient calculation:
gx = cv2.Sobel(gr,cv2.CV_64F,1,0,ksize=1)
gy = cv2.Sobel(gr,cv2.CV_64F,0,1,ksize=1)
blurredgx = cv2.GaussianBlur(gx,(11,3),1)
blurredgy = cv2.GaussianBlur(gy,(11,3),1)
magnitude, angle = cv2.cartToPolar(blurredgx, blurredgy)
Could you please explain how you've computed orientation of gradients? I believe you've grouped each 4x4 window and computed the orientation of gradients inside each such window. But the sobel operator used has size 5x5. This obviously result in some overlap. Could you explain on this?
I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.
I am working with OpenCV on the Android platform. With the tremendous help from this community and techies, I am able to successfully detect a sheet out of the image.
These are the step I used.
Imgproc.cvtColor()
Imgproc.Canny()
Imgproc.GausianBlur()
Imgproc.findContours()
Imgproc.approxPolyDP()
findLargestRectangle()
Find the vertices of the rectangle
Find the vertices of the rectangle top-left anticlockwise order using center of mass approach
Find the height and width of the rectangle just to maintain the aspect ratio and do warpPerspective transformation.
After applying all these steps I can easily get the document or the largest rectangle from an image. But it highly depends on the difference in the intensities of the background and the document sheet. As the Canny edge detector works on the principle of intensity gradient, a difference in intensity is always assumed from the implementation side. That is why Canny took into the account the various threshold parameters.
Lower threshold
Higher threshold
So if the intensity gradient of a pixel is greater than the higher threshold, it will be added as an edge pixel in the output image. A pixel will be rejected completely if its intensity gradient value is lower than the lower threshold. And if a pixel has an intensity between the lower and higher threshold, it will only be added as an edge pixel if it is connected to any other pixel having the value larger than the higher threshold.
My main purpose is to use Canny edge detection for the document scanning. So how can I compute these thresholds dynamically so that it can work with the both cases of dark and light background?
I tried a lot by manually adjusting the parameters, but I couldn't find any relationship associated with the scenarios.
You could calculate your thresholds using Otsu’s method.
The (Python) code would look like this:
high_thresh, thresh_im = cv2.threshold(im, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
lowThresh = 0.5*high_thresh
Use the following snippet which I obtained from this blog:
v = np.median(gray_image)
#---- Apply automatic Canny edge detection using the computed median----
lower = int(max(0, (1.0 - sigma) * v))
upper = int(min(255, (1.0 + sigma) * v))
edged = cv2.Canny(gray_image, lower, upper)
cv2.imshow('Edges',edged)
##So what am I doing here?
I am taking the median value of the gray scale image. The sigma value of 0.33 is chosen to set the lower and upper threshold. 0.33 value is generally used by statisticians for data science. So it is considered here as well.
I'm trying to implement rectangle detection using the Hough transform, based on
this paper.
I programmed it using Matlab, but after the detection of parallel pair lines and orthogonal pairs, I must detect the intersection of these pairs. My question is about the quality of the two line intersection in Hough space.
I found the intersection points by solving four equation systems. Do these intersection points lie in cartesian or polar coordinate space?
For those of you wondering about the paper, it's:
Rectangle Detection based on a Windowed Hough Transform by Cláudio Rosito Jung and Rodrigo Schramm.
Now according to the paper, the intersection points are expressed as polar coordinates, obviously you implementation may be different (the only way to tell is to show us your code).
Assuming you are being consistent with his notation, your peaks should be expressed as:
You must then perform peak paring given by equation (3) in section 4.3 or
where represents the angular threshold corresponding to parallel lines
and is the normalized threshold corresponding to lines of similar length.
The accuracy of the Hough space should be dependent on two main factors.
The accumulator maps onto Hough Space. To loop through the accumulator array requires that the accumulator divide the Hough Space into a discrete grid.
The second factor in accuracy in Linear Hough Space is the location of the origin in the original image. Look for a moment at what happens if you do a sweep of \theta for any given change in \rho. Near the origin, one of these sweeps will cover far less pixels than a sweep out near the edges of the image. This has the consequence that near the edges of the image you need a much higher \rho \theta resolution in your accumulator to achieve the same level of accuracy when transforming back to Cartesian.
The problem with increasing the resolution of course is that you will need more computational power and memory to increase it. Also If you uniformly increase the accumulator resolution you have wasted resolution near the origin where it is not needed.
Some ideas to help with this.
place the origin right at the
center of the image. as opposed to
using the natural bottom left or top
left of an image in code.
try using the closest image you can
get to a square. the more elongated an
image is for a given area the more
pronounced the resolution trap
becomes at the edges
Try dividing your image into 4/9/16
etc different accumulators each with
an origin in the center of that sub-image.
It will require a little overhead to link
the results of each accumulator together
for rectangle detection, but it should help
spread the resolution more evenly.
The ultimate solution would be to increase
the resolution linearly depending on the
distance from the origin. this can be achieved using the
(x-a)^2 + (y-b)^2 = \rho^2
circle equation where
- x,y are the current pixel
- a,b are your chosen origin
- \rho is the radius
once the radius is known adjust your accumulator
resolution accordingly. You will have to keep
track of the center of each \rho \theta bin.
for transforming back to Cartesian
The link to the referenced paper does not work, but if you used the standard hough transform than the four intersection points will be expressed in cartesian coordinates. In fact, the four lines detected with the hough tranform will be expressed using the "normal parametrization":
rho = x cos(theta) + y sin(theta)
so you will have four pairs (rho_i, theta_i) that identifies your four lines. After checking for orthogonality (for example just by comparing the angles theta_i) you solve four equation system each of the form:
rho_j = x cos(theta_j) + y sin(theta_j)
rho_k = x cos(theta_k) + y sin(theta_k)
where x and y are the unknowns that represents the cartesian coordinates of the intersection point.
I am not a mathematician. I am willing to stand corrected...
From Hough 2) ... any line on the xy plane can be described as p = x cos theta + y sin theta. In this representation, p is the normal distance and theta is the normal angle of a straight line, ... In practical applications, the angles theta and distances p are quantized, and we obtain an array C(p, theta).
from CRC standard math tables Analytic Geometry, Polar Coordinates in a Plane section ...
Such an ordered pair of numbers (r, theta) are called polar coordinates of the point p.
Straight lines: let p = distance of line from O, w = counterclockwise angle from OX to the perpendicular through O to the line. Normal form: r cos(theta - w) = p.
From this I conclude that the points lie in polar coordinate space.