In image processing, the Laplacian filter adds the two second order derivatives, one in x direction and the other in y direction.
However, I am confused when we use first order derivative filters. In that case, we don't add the two first order derivatives. Instead we use the magnitude of the two first order derivatives, that is the L2 norm of the gradient.
I want to know why we don't add these two first order derivatives like Laplacian when we use first order derivative filters. Thanks a lot.
The Laplacian is defined as the trace of the Hessian matrix. The Hessian matrix collects all second-order derivatives, which include also things like d^2/dxdy. The diagonal of the Hessian are the second derivative along each axis. Thus, the trace is their sum. [You should look into the determinant of the Hessian, it’s an interesting operator too.]
The gradient is a vector, composed of the partial derivative along each axis. Its magnitude (norm) is the square root of the sum of the square elements.
These things are different because they have a different meaning and a different purpose.
Related
While learning an image denoising technique based on bilateral filter, I encountered this tutorial which provides with full lists of arguments used to run OpenCV's bilateralFilter function. What I see, it's slightly confusing, because there is no explanation about a mathematical rule to alter the diameter value by manipulating both the sigma arguments. So, if picking some specific arguments to pass into that function, I realize hardly what diameter corresponds with a particular couple of sigma values.
Does there exist a dependency between both deviations and the diameter? If my inference is correct, what equation (may be, introduced in OpenCV documentation) is to be referred if applying bilateral filter in a program-based solution?
According to the documentation, the bilateralFilter function in OpenCV takes a parameter d, the neighborhood diameter, as well as a parameter sigmaSpace, the spatial sigma. They can be selected separately, but if d "is non-positive, it is computed from sigmaSpace." For more details we need to look at the source code:
if( d <= 0 )
radius = cvRound(sigma_space*1.5);
else
radius = d/2;
radius = MAX(radius, 1);
d = radius*2 + 1;
That is, if d is not positive, then it is taken as 3 times sigmaSpace. d is also always forced to be odd, so that there is a central pixel in the neighborhood.
Note that the other sigma, sigmaColor, is unrelated to the spatial size of the filter.
In general, if one chooses a sigmaSpace that is too large for the given d, then the Gaussian kernel will be cut off in a way that makes it not appear like a Gaussian, and loose its nice filtering properties (see for example here for an explanation). If it is taken too small for the given d, then many pixels in the neighborhood will always have a near-zero weight, meaning that computational work is wasted. The default value is rather small (one typically uses a radius of 3 times sigma for Gaussian filtering), but is still quite reasonable given the computational cost of the bilateral filter (a smaller neighborhood is cheaper).
These two value (d and sigma) are totally unrelated to each other. Sigma determines the values of the pixels of the kernel, but d determines the size of the kernel.
For example consider this Gaussian filter with sigma=1:
It's a filter kernel and and as you can see the pixel values of the kernel only depends on sigma (the 3*3 matrix in the middle is equal in both kernel), but reducing the size of the kernel (or reducing the diameter) will make the outer pixels ineffective without effecting the values of the middle pixels.
And now if you change the sigma, (with k=3) the kernel is still 3*3 but the pixels' values would be different.
"Before extracting the lines, you need to detect potential points on them. Apply a Gaussian filter first and use the Sobel filters as derivative operators. Threshold the determinant of the Hessian and then apply non-maximum suppression in 3 × 3 neighborhoods. Ignore pixels for which any of the filters falls even partially out of the image boundaries."
I understand to gaussian an image first to eliminate noise, then take twice with Sobel_x and Sobel_y, respectively, which became Ixx and Iyy in Hessian that would show horizontal line and vertical line in image.But how am I suppose to get Ixxyy? But how could I combine these two image together to make Ixxyy as the right bottom in Hessian matrix?
The two off-diagonal elements of the Hessian matrix are d^2/dxdy. That is, they are the first derivative along y applied to the first derivative along x.
If the top-left element is obtained by Sobel_x( Sobel_x( image )), and the bottom-right element is Sobel_y( Sobel_y( image )), then the two other elements are both Sobel_y( Sobel_x( image )) or, equivalently, Sobel_x( Sobel_y( image )) (note that these two should be identical).
Do take into account that negative values are important here, and you should thus be careful to compute the Sobel filter in a way that preserves those negative values—don't store them in an unsigned integer array!
What is the different between LoG (Laplacian of Gaussian) filter, first and second derivative Gaussian filter?
Is it second derivative Gaussian filter equal to Laplacian Operator?
https://docs.opencv.org/3.4/d5/db5/tutorial_laplace_operator.html
When I search online, there are different shape said that is second derivative Gaussian filter. So which one is correct?
http://www.me.umn.edu/courses/me5286/vision/VisionNotes/2017/ME5286-Lecture7-2017-EdgeDetection2.pdf (P.39)
https://www.swarthmore.edu/NatSci/mzucker1/e27_s2016/filter-slides.pdf (P.31)
http://www.cse.psu.edu/~rtc12/CSE486/lecture11_6pp.pdf (P.3)
Yes, the Laplace is defined as the sum of second order partial derivatives. As in the equation you show.
In the first image, f is not a Gaussian, f' is. Thus f" there is the first derivative of the Gaussian. The other image shows the 2nd derivative of a Gaussian.
I'm trying to use opencv to find some template in images. While opencv has several template matching methods, I have big trouble to understand the difference and when to use which by looking at their mathematic equization:
CV_TM_SQDIFF
CV_TM_SQDIFF_NORMED
CV_TM_CCORR
CV_TM_CCORR_NORMED
CV_TM_CCOEFF
Can someone explain the major difference between all these method in a non-mathematical way?
The general idea of template matching is to give each location in the target image I, a similarity measure, or score, for the given template T. The output of this process is the image R.
Each element in R is computed from the template, which spans over the ranges of x' and y', and a window in I of the same size.
Now, you have two windows and you want to know how similar they are:
CV_TM_SQDIFF - Sum of Square Differences (or SSD):
Simple euclidian distance (squared):
Take every pair of pixels and subtract
Square the difference
Sum all the squares
CV_TM_SQDIFF_NORMED - SSD Normed
This is rarely used in practice, but the normalization part is similar in the next methods.
The nominator term is same as above, but divided by a factor, computed from the
- square root of the product of:
sum of the template, squared
sum of the image window, squared
CV_TM_CCORR - Cross Correlation
Basically, this is a dot product:
Take every pair of pixels and multiply
Sum all products
CV_TM_CCOEFF - Cross Coefficient
Similar to Cross Correlation, but normalized with their Covariances (which I find hard to explain without math. But I would refer to
mathworld
or mathworks
for some examples
Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.