Laplacian kernels of higher order in image processing - image-processing

In literature on digital image processing you find examples of Laplace kernels of relatively low orders, typically, 3 or 5. I wonder, is there any general way to build Laplace kernels or arbitrary order? Links or/and references would be appreciated.

The Laplace operator is defined as the sum of the second derivatives along each of the axes of the image. (That is, it is the trace of the Hessian matrix):
Δ I = ( ∂2/∂x2 + ∂2/∂y2 ) I
There are two common ways to discretize this:
Use finite differences. The derivative operator is the convolution by [1,-1] or [0.5,0,-0.5], the second derivative operator applying the [1,-1] convolution twice, leading to a convolution with [1,-2,1].
Convolve with the derivative of a regularization kernel. The optimal regularization kernel is the Gaussian, leading to a Laplace of Gaussian operator. The result is the exact Laplace of the image smoothed by the Gaussian kernel.
An alternative is to replace the regularization kernel with an interpolating kernel. A former colleague of mine published a paper on this method:
A. Hast, "Simple filter design for first and second order derivatives by a double filtering approach", Pattern Recognition Letters 42(1):65-71, 2014.
He used a "double filter", but with linear filters that can always be simplified to a single convolution.
The idea is simply that, take an interpolating kernel, and compute its derivative at integer locations. The interpolating kernel is always 1 at the origin, and 0 at other integer locations, but it waves through these "knot points", meaning that its derivative is not zero at these integer locations.
In the extreme case, take the ideal interpolator, the sinc function:
sinc(x) = sin(πx) / πx
Its second derivative is:
d2/dx2(sinc(πx)) = [ (2 - π2x2) sin(πx) - 2πx cos(πx) ] / (πx3)
Which sampled at 11 integer locations leads to:
[ 0.08 -0.125 0.222 -0.5 2 -3 2 -0.5 0.222 -0.125 0.08 ]
But note that the normalization is not correct here, as we're cutting off the infinitely long kernel. Thus, it's better to pick a shorter kernel, such as the cubic spline kernel.
A second alternative is to compute the Laplace operator through the Fourier domain. This simply requires multiplying with -πu2-πv2, with u and v the frequencies.
This is some MATLAB code that applies this filter to a unit impulse image, leading to an image of the kernel of size 256x256:
[u,v] = meshgrid((-128:127)/256,(-128:127)/256);
Dxx = -4*(pi*u).^2;
Dyy = -4*(pi*v).^2;
L = Dxx + Dyy;
l = fftshift(ifft2(ifftshift(L)));
l = real(l); % discard insignificant imaginary component (probably not necessary in MATLAB, but Octave leaves these values there)
l(abs(l)<1e-6) = 0; % set near-zero values to zero
l here is the same as the result above for the ideal interpolator, adding the vertical and horizontal ones together, and normalizing for a length of 256.
Finally, I'd like to mention that the Laplace operator is very sensitive to noise (high frequencies are enhanced significantly). The methods discussed here are meaningful only for data without nose (presumably synthetic data?). For any real-world data, I highly recommend that you use the Laplace of Gaussian. This will give you the exact Laplace of the smoothed image. The smoothing is necessary to prevent influence from noise. With little noise, you can use a small Gaussian sigma (e.g. σ=0.8). This will give you much more useful results than any other approach.

Related

Why do we normalize homography or fundamental matrix?

I want to know about why do we normalize the homography or fundamental matrix? Here is the code in particular.
H = H * (1.0 / H[2, 2]) # Normalization step. H is [3, 3] matrix.
I can understand that we have to normalize the data before computing SVD because of instability caused by linear least squares but why do we normalize it in end?
A homography in 3D space has 8 degrees of freedom by definition, mapping from one plane to another using perspective. Such a homography can be defined by giving four points, which makes eight coordinates (scalars).
A 3x3 matrix has 9 elements, so it has 9 degrees of freedom. That is one degree more than needed for a homography.
The homography doesn't change when the matrix is scaled (multiplied by a scalar). All the math works the same. You don't need to normalize your homography matrix.
It is a good idea to normalize.
For one, it makes the arithmetic somewhat tamer. Have some wikipedia links to fields of study because weaving all these into a coherent sentence... doesn't add anything:
Numerical analysis, Condition number, Floating-point arithmetic, Numerical error, Numerical stability, ...
Also, normalization makes the matrix easier for humans to interpret. The most common normalization is to scale the matrix such that the last element becomes 1. That is convenient because this whole math happens in a projective space, where the projection causes points to be mapped to the w=1 plane, making vectors have a 1 for the last element.
How is the homography matrix provided to you?
For example, in the scene that some library function calculates and provides the homography matrix to you,
if the function specification doesn't mention about the scale...
In an extreme case, the function can be implemented as:
Matrix3x3 CalculateHomographyMatrix( some arguments )
{
Matrix3x3 H = ...; //Homogoraphy Calculation
return Non_Zero_Random_Value * H; //Wow!
}
Element values may become very large or very small and using such values to your process may cause problems (floating point computation errors).

What is a mathematical relation of diameter and sigma arguments in bilateral filter function?

While learning an image denoising technique based on bilateral filter, I encountered this tutorial which provides with full lists of arguments used to run OpenCV's bilateralFilter function. What I see, it's slightly confusing, because there is no explanation about a mathematical rule to alter the diameter value by manipulating both the sigma arguments. So, if picking some specific arguments to pass into that function, I realize hardly what diameter corresponds with a particular couple of sigma values.
Does there exist a dependency between both deviations and the diameter? If my inference is correct, what equation (may be, introduced in OpenCV documentation) is to be referred if applying bilateral filter in a program-based solution?
According to the documentation, the bilateralFilter function in OpenCV takes a parameter d, the neighborhood diameter, as well as a parameter sigmaSpace, the spatial sigma. They can be selected separately, but if d "is non-positive, it is computed from sigmaSpace." For more details we need to look at the source code:
if( d <= 0 )
radius = cvRound(sigma_space*1.5);
else
radius = d/2;
radius = MAX(radius, 1);
d = radius*2 + 1;
That is, if d is not positive, then it is taken as 3 times sigmaSpace. d is also always forced to be odd, so that there is a central pixel in the neighborhood.
Note that the other sigma, sigmaColor, is unrelated to the spatial size of the filter.
In general, if one chooses a sigmaSpace that is too large for the given d, then the Gaussian kernel will be cut off in a way that makes it not appear like a Gaussian, and loose its nice filtering properties (see for example here for an explanation). If it is taken too small for the given d, then many pixels in the neighborhood will always have a near-zero weight, meaning that computational work is wasted. The default value is rather small (one typically uses a radius of 3 times sigma for Gaussian filtering), but is still quite reasonable given the computational cost of the bilateral filter (a smaller neighborhood is cheaper).
These two value (d and sigma) are totally unrelated to each other. Sigma determines the values of the pixels of the kernel, but d determines the size of the kernel.
For example consider this Gaussian filter with sigma=1:
It's a filter kernel and and as you can see the pixel values of the kernel only depends on sigma (the 3*3 matrix in the middle is equal in both kernel), but reducing the size of the kernel (or reducing the diameter) will make the outer pixels ineffective without effecting the values of the middle pixels.
And now if you change the sigma, (with k=3) the kernel is still 3*3 but the pixels' values would be different.

Approximating true heightmap gradient magnitude with opencv's Sobel filter

I have an image (cv::Mat, type CV_32F) representing grid-sampled height function. The grid has constant raster (dx,dy) per pixel.
I would like to estimate its gradient magnitude. Using OpenCV's Sobel filter, I approximate derivatives like this:
dfdx=zz.Sobel(zz,cv2.CV_32F,1,0,ksize=3,scale=?)
dfdy=zz.Sobel(zz,cv2.CV_32F,0,1,ksize=3,scale=?)
gradMag=np.sqrt(dfdx**2+dfdy**2)
The scale parameter is barely documented, but looking into the source, it is used to multiply derivative kernels, i.e. the (-1,0,1) for finite differences. Using the 3x3 Sobel kernel, I assumed the scale should then be 1/2*dx or 1/2*dy (finite differences scehme) to obtain derivatives in true scale, but that does not seem to be the case: I was testing this on a synthetic image of hemisphere with different raster but not getting consistent results.
How is scale supposed to be used to incorporate raster dimensions, thus getting real derivative estimates?
Scale must be equal 0.25, from here: OpenCV's Sobel filter - why does it look so bad, especially compared to Gimp?
The normalization divisor for kernels can be calculated by the following fomula:
enter code heref = max(abs(sumNegative), abs(sumPositive))
where sumNegative is the sum of negative values in the kernel and sumPositive the sum of positive values in the kernel.

normalization in image processing

What is the correct mean of normalization in image processing? I googled it but i had different definition. I'll try to explain in detail each definition.
Normalization of a kernel matrix
If normalization is referred to a matrix (such as a kernel matrix for convolution filter), usually each value of the matrix is divided by the sum of the values of the matrix in order to have the sum of the values of the matrix equal to one (if all values are greater than zero). This is useful because a convolution between an image matrix and our kernel matrix give an output image with values between 0 and the max value of the original image. But if we use a sobel matrix (that have some negative values) this is not true anymore and we have to stretch the output image in order to have all values between 0 and max value.
Normalization of an image
I basically find two definition of normalization. The first one is to "cut" values too high or too low. i.e. if the image matrix has negative values one set them to zero and if the image matrix has values higher than max value one set them to max values. The second one is to linear stretch all the values in order to fit them into the interval [0, max value].
I will extend a bit the answer from #metsburg. There are several ways of normalizing an image (in general, a data vector), which are used at convenience for different cases:
Data normalization or data (re-)scaling: the data is projected in to a predefined range (i.e. usually [0, 1] or [-1, 1]). This is useful when you have data from different formats (or datasets) and you want to normalize all of them so you can apply the same algorithms over them. Is usually performed as follows:
Inew = (I - I.min) * (newmax - newmin)/(I.max - I.min) + newmin
Data standarization is another way of normalizing the data (used a lot in machine learning), where the mean is substracted to the image and dividied by its standard deviation. It is specially useful if you are going to use the image as an input for some machine learning algorithm, as many of them perform better as they assume features to have a gaussian form with mean=0,std=1. It can be performed easyly as:
Inew = (I - I.mean) / I.std
Data stretching or (histogram stretching when you work with images), is refereed as your option 2. Usually the image is clamped to a minimum and maximum values, setting:
Inew = I
Inew[I < a] = a
Inew[I > b] = b
Here, image values that are lower than a are set to a, and the same happens inversely with b. Usually, values of a and b are calculated as percentage thresholds. a= the threshold that separates bottom 1% of the data and b=the thredhold that separates top 1% of the data. By doing this, you are removing outliers (noise) from the image.
This is similar (simpler) to histogram equalization, which is another used preprocessing step.
Data normalization, can also be refereed to a normalization of a vector respect to a norm (l1 norm or l2/euclidean norm). This, in practice, is translated as to:
Inew = I / ||I||
where ||I|| refeers to a norm of I.
If the norm is choosen to be the l1 norm, the image will be divided by the sum of its absolute values, making the sum of the whole image be equal to 1. If the norm is choosen to be l2 (or euclidean), then image is divided by the sum of the square values of I, making the sum of square values of I be equal to 1.
The first 3 are widely used with images (not the 3 of them, as scaling and standarization are incompatible, but 1 of them or scaling + streching or standarization + stretching), the last one is not that useful. It is usually applied as a preprocess for some statistical tools, but not if you plan to work with a single image.
Answer by #Imanol is great, i just want to add some examples:
Normalize the input either pixel wise or dataset wise. Three normalization schemes are often seen:
Normalizing the pixel values between 0 and 1:
img /= 255.0
Normalizing the pixel values between -1 and 1 (as Tensorflow does):
img /= 127.5
img -= 1.0
Normalizing according to the dataset mean & standard deviation (as Torch does):
img /= 255.0
mean = [0.485, 0.456, 0.406] # Here it's ImageNet statistics
std = [0.229, 0.224, 0.225]
for i in range(3): # Considering an ordering NCHW (batch, channel, height, width)
img[i, :, :] -= mean[i]
img[i, :, :] /= std[i]
In data science, there are two broadly used normalization types:
1) Where we try to shift the data so that there sum is a particular value, usually 1 (https://stats.stackexchange.com/questions/62353/what-does-it-mean-to-use-a-normalizing-factor-to-sum-to-unity)
2) Normalize data to fit it within a certain range (usually, 0 to 1): https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range

Optimal sigma for Gaussian filtering of an image?

When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}

Resources