Laplacian of gaussian filter use - image-processing

This is a formula for LoG filtering:
(source: ed.ac.uk)
Also in applications with LoG filtering I see that function is called with only one parameter:
sigma(σ).
I want to try LoG filtering using that formula (previous attempt was by gaussian filter and then laplacian filter with some filter-window size )
But looking at that formula I can't understand how the size of filter is connected with this formula, does it mean that the filter size is fixed?
Can you explain how to use it?

As you've probably figured out by now from the other answers and links, LoG filter detects edges and lines in the image. What is still missing is an explanation of what σ is.
σ is the scale of the filter. Is a one-pixel-wide line a line or noise? Is a line 6 pixels wide a line or an object with two distinct parallel edges? Is a gradient that changes from black to white across 6 or 8 pixels an edge or just a gradient? It's something you have to decide, and the value of σ reflects your decision — the larger σ is the wider are the lines, the smoother the edges, and more noise is ignored.
Do not get confused between the scale of the filter (σ) and the size of the discrete approximation (usually called stencil). In Paul's link σ=1.4 and the stencil size is 9. While it is usually reasonable to use stencil size of 4σ to 6σ, these two quantities are quite independent. A larger stencil provides better approximation of the filter, but in most cases you don't need a very good approximation.

This was something that confused me too, and it wasn't until I had to do the same as you for a uni project that I understood what you were supposed to do with the formula!
You can use this formula to generate a discrete LoG filter. If you write a bit of code to implement that formula, you can then to generate a filter for use in image convolution. To generate, say a 5x5 template, simply call the code with x and y ranging from -2 to +2.
This will generate the values to use in a LoG template. If you graph the values this produces you should see the "mexican hat" shape typical of this filter, like so:
(source: ed.ac.uk)
You can fine tune the template by changing how wide it is (the size) and the sigma value (how broad the peak is). The wider and broader the template the less affected by noise the result will be because it will operate over a wider area.
Once you have the filter, you can apply it to the image by convolving the template with the image. If you've not done this before, check out these few tutorials.
java applet tutorials more mathsy.
Essentially, at each pixel location, you "place" your convolution template, centred at that pixel. You then multiply the surrounding pixel values by the corresponding "pixel" in the template and add up the result. This is then the new pixel value at that location (typically you also have to normalise (scale) the output to bring it back into the correct value range).
The code below gives a rough idea of how you might implement this. Please forgive any mistakes / typos etc. as it hasn't been tested.
I hope this helps.
private float LoG(float x, float y, float sigma)
{
// implement formula here
return (1 / (Math.PI * sigma*sigma*sigma*sigma)) * //etc etc - also, can't remember the code for "to the power of" off hand
}
private void GenerateTemplate(int templateSize, float sigma)
{
// Make sure it's an odd number for convenience
if(templateSize % 2 == 1)
{
// Create the data array
float[][] template = new float[templateSize][templatesize];
// Work out the "min and max" values. Log is centered around 0, 0
// so, for a size 5 template (say) we want to get the values from
// -2 to +2, ie: -2, -1, 0, +1, +2 and feed those into the formula.
int min = Math.Ceil(-templateSize / 2) - 1;
int max = Math.Floor(templateSize / 2) + 1;
// We also need a count to index into the data array...
int xCount = 0;
int yCount = 0;
for(int x = min; x <= max; ++x)
{
for(int y = min; y <= max; ++y)
{
// Get the LoG value for this (x,y) pair
template[xCount][yCount] = LoG(x, y, sigma);
++yCount;
}
++xCount;
}
}
}

Just for visualization purposes, here is a simple Matlab 3D colored plot of the Laplacian of Gaussian (Mexican Hat) wavelet. You can change the sigma(σ) parameter and see its effect on the shape of the graph:
sigmaSq = 0.5 % Square of σ parameter
[x y] = meshgrid(linspace(-3,3), linspace(-3,3));
z = (-1/(pi*(sigmaSq^2))) .* (1-((x.^2+y.^2)/(2*sigmaSq))) .*exp(-(x.^2+y.^2)/(2*sigmaSq));
surf(x,y,z)
You could also compare the effects of the sigma parameter on the Mexican Hat doing the following:
t = -5:0.01:5;
sigma = 0.5;
mexhat05 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 1;
mexhat1 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 2;
mexhat2 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
plot(t, mexhat05, 'r', ...
t, mexhat1, 'b', ...
t, mexhat2, 'g');
Or simply use the Wavelet toolbox provided by Matlab as follows:
lb = -5; ub = 5; n = 1000;
[psi,x] = mexihat(lb,ub,n);
plot(x,psi), title('Mexican hat wavelet')
I found this useful when implementing this for edge detection in computer vision. Although not the exact answer, hope this helps.

It appears to be a continuous circular filter whose radius is sqrt(2) * sigma. If you want to implement this for image processing you'll need to approximate it.
There's an example for sigma = 1.4 here: http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm

Related

How to set the region (and its shape) over which the SIFT descriptor is computed?

A similar question has been asked here. However I could not understand it clearly.
I understand that SIFT computation has the following steps:
Finding scale space extrema
Keypoint localization(and filtering)
Orientation assignment (using computation of gradient magnitude and orientation)
Create SIFT descriptor
My question is for the fourth step: How to set the region over which the SIFT descriptor is computed? Also how is the shape of the region for SIFT computation determined?
Suppose the scale space extrema was found at scale "s" in the second octave. I use the gradient orientation to align to a canonical orientation. How do I set the region of computation of the SIFT descriptor using these information? Do I use the scale or the magnitude of the gradient to find the region on which SIFT is to be computed? Also how is the shape of the region determined?
So this was surprisingly tricky to find an answer for.
David Lowe's original paper only seemed to provide vague theoretical explanation on how his algorithm worked.
And as far as I know, his official implementation never had its feature descriptor code open-sourced.
So I'm basing my answer off what I consider the next-most canonical implementation of the SIFT algorithm, being Rob Hess' OpenSIFT implementation;
which became the base for OpenCV's official implementation.
Anyway, here is my understanding of how SIFT roughly works:
Once you have located your extrema, you should know which octave & interval of the Gaussian Pyramid the extrema belongs to.
Based on Rob's code (these two functions on lines 1026-1112), the feature descriptor is calculated from the blurred image of that octave & interval.
And the region for calculating SIFT is a square shape surrounding the keypoint. This medium article also seems to agree (see illustration).
The SIFT formula for the Gaussian Kernel scale, relative to the original image size is (reference):
base_scale * 2^(octave + interval / intervals_per_octave)
Or this formula if working relative to the halved image in each octave:
base_scale * 2^(interval / intervals_per_octave)
Where the original paper defined the parameters through experiments as:
base_scale = 1.6 and intervals_per_octave = 3
So if your SIFT was set to have 3 intervals per octave, with a base Gaussian scale of 1.6, and the extrema was found on octave 2, interval 3;
the image will have been blurred by a Gaussian Kernel of scale : 1.6 * 2^(2 + 3/3) = 12.80 pixels
Now the actual array size of the Gaussian kernel will depend on the code you use, as the scale and the kernel size can be set independently.
In cases like MATLAB, I've found a helpful guidelines from this SO thread.
The selected answer recommends kernel width of 6 times the scale (i.e. 3 sigma rule), our kernel width (and height) is 12.80 * 6 ≈ 77 pixels;
thus, a SIFT descriptor region of size 77x77 pixels.
Meanwhile, the OpenCV implementation appears to leave the size of the kernel to be determined by OpenCV's own built-in Gaussian Blur function.
Line 246 from OpenCV's code leaves the Gaussian Blur function parameter ksize as zeroes,
which the official docs only states the kernel size will be "computed from sigma", and never defines how it is actually calculated...
Finally, for Rob's implementation, I have to admit that I couldn't quite understand what was happening in this final step. ¯\_(ツ)_/¯
From lines 1026-1112 Rob defined the code below, which shows show how he calculates the orientation histogram for the SIFT descriptor.
The code shows he defined a radius and used the nested for-loops with i and j to iterate through the square region around the keypoint, located at point (r,c).
Yet what I don't really understand is:
How he defined radius, with the Gaussian scale scl multiplied with some unknown constant SIFT_DESCR_SCL_FCTR = 3.0
As well as hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5, where d = SIFT_DESCR_WIDTH = 4
hist_width = SIFT_DESCR_SCL_FCTR * scl;
radius = hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5;
for( i = -radius; i <= radius; i++ )
for( j = -radius; j <= radius; j++ )
{
/*
Calculate sample's histogram array coords rotated relative to ori.
Subtract 0.5 so samples that fall e.g. in the center of row 1 (i.e.
r_rot = 1.5) have full weight placed in row 1 after interpolation.
*/
c_rot = ( j * cos_t - i * sin_t ) / hist_width;
r_rot = ( j * sin_t + i * cos_t ) / hist_width;
rbin = r_rot + d / 2 - 0.5;
cbin = c_rot + d / 2 - 0.5;
if( rbin > -1.0 && rbin < d && cbin > -1.0 && cbin < d )
if( calc_grad_mag_ori( img, r + i, c + j, &grad_mag, &grad_ori ))
{
grad_ori -= ori;
while( grad_ori < 0.0 )
grad_ori += PI2;
while( grad_ori >= PI2 )
grad_ori -= PI2;
obin = grad_ori * bins_per_rad;
w = exp( -(c_rot * c_rot + r_rot * r_rot) / exp_denom );
interp_hist_entry( hist, rbin, cbin, obin, grad_mag * w, d, n );
}
}
But regardless of how the exact size of the region is calculated, I think the general concept is the same.
To calculate the region size based on the original Gaussian scale.
Besides, given that the features are supposed to be "weighted by a Gaussian window" (original paper, section 6.1, page 15);
as long as the region you define is large enough to contain most of the meaningful orientation histograms, you are fine.
In summary:
The SIFT descriptor is calculated from the halved & blurred image of the same octave/interval as the keypoint (OpenSIFT)
The region for the SIFT descriptor is a square shape surrounding the keypoint (medium)(image)
The region size is calculated based on the Gaussian kernel scale, though the exact method for calculation can vary an easy rule of thumb is "width of 6 times the kernel scale" (thread)

SURF: How could we get the value of sigma from the keypoint radius

In the SURF technique, and more precisely within the feature description stage, the authors have stated (if I understand correctly) that the description will be performed in a area of 20 times sigma. Sigma represents the scale on which the keypoint was detected.
Sigma = 0.4 x L where L = 2^Octave x level+1. If we use the OpenCV implementation, the DetectAndCompute function computes, with the value of Keypoint.size, the radius of the circle surrounding the keypoint.
My question is : How could we get the value of sigma from the radius value ?
According to these lines:
KeyPoint& kp = (*keypoints)[k];
float size = kp.size;
Point2f center = kp.pt;
/* The sampling intervals and wavelet sized for selecting an orientation
and building the keypoint descriptor are defined relative to 's' */
float s = size*1.2f/9.0f;
This value s = size*1.2f/9.0f is not montioned in the bay's article scale= L*0.4 or
scale= L* 1.2/3 any one can explain me this part??

How do you calculate the average gradient direction and average gradient strength/magnitude

In OpenCV how do you calculate the average gradient strength in a Mat and the average gradient direction?
I have sourced the below methods by googling but I want to confirm I am actually doing this correctly before moving onto the next step.
Is this correct?
Mat img = imread('foo.png', CV_8UC); // read image as grayscale single channel
// Calculate the mean intensity and the std deviation
// Any errors here or am I doing this correctly?
Scalar sMean, sStdDev;
meanStdDev(src, sMean, sStdDev);
double mean = sMean[0];
double stddev = sStdDev[0];
// Calculate the average gradient magnitude/strength across the image
// Any errors here or am I doing this correctly?
Mat dX, dY, magnitude;
Sobel(src, dX, CV_32F, 1, 0, 1);
Sobel(src, dY, CV_32F, 0, 1, 1);
magnitude(dX, dY, magnitude);
Scalar sMMean, sMStdDev;
meanStdDev(magnitude, sMMean, sMStdDev);
double magnitudeMean = sMMean[0];
double magnitudeStdDev = sMStdDev[0];
// Calculate the average gradient direction across the image
// Any errors here or am I doing this correctly?
Scalar avgHorizDir = mean(dX);
Scalar avgVertDir = mean(dY);
double avgDir = atan2(-avgVertDir[0], avgHorizDir[0]);
float blurriness = cv::videostab::calcBlurriness(src); // low values = sharper. High values = blurry
Technically those are the correct ways of obtaining the two averages.
The way you compute mean direction uses weighted directional statistics, meaning that pixels without a strong gradient have less influence on the average.
However, for most images this average direction is not very meaningful, as there exist edges in all directions and cancel out.
If your image is of a single edge, then this will work great.
If your image has lines in it, containing edges in opposite directions, this will not work. In this case, you want to average the double angle (average orientations). The obvious way of doing this is to compute the direction per pixel as an angle, double them, then use directional statistics to average (ie convert back to vectors and average those). Doubling the angle causes opposite directions to be mapped to the same value, thus averaging doesn’t cancel these out.
Another simple way to average orientations is to take the average of the tensor field obtained by the outer product of the gradient field with itself, and determine the direction of the eigenvector corresponding to the largest eigenvalue. The tensor field is obtained as follows:
Mat Sxx = dX * dX;
Mat Syy = dY * dY;
Mat Sxy = dX * dY;
This should then be averaged:
Scalar mSxx = mean(sXX);
Scalar mSyy = mean(sYY);
Scalar mSxy = mean(sXY);
These values form a 2x2 real-valued symmetric matrix:
| mSxx mSxy |
| mSxy mSyy |
It is relatively straight-forward to determine its eigendecomposition, and can be done analytically. I don’t have the equations on hand right now, so I’ll leave it as an exercise to the reader. :)

How to generate a random quaternion quickly?

I searched around and it turns out the answer to this is surprising hard to find. Theres algorithm out there that can generate a random orientation in quaternion form but they involve sqrt and trig functions. I dont really need a uniformly distributed orientation. I just need to generate (many) quaternions such that their randomness in orientation is "good enough." I cant specify what is "good enough" except that I need to be able to do the generation quickly.
Quoted from http://planning.cs.uiuc.edu/node198.html:
Choose three points u, v, w ∈ [0,1] uniformly at random. A uniform, random quaternion is given by the simple expression:
 h = ( sqrt(1-u) sin(2πv), sqrt(1-u) cos(2πv), sqrt(u) sin(2πw), sqrt(u) cos(2πw))
From Choosing a Point from the Surface of a Sphere by George Marsaglia:
Generate independent x, y uniformly in (-1..1) until z = x²+y² < 1.
Generate independent u, v uniformly in (-1..1) until w = u²+v² < 1.
Compute s = √((1-z) / w).
Return the quaternion (x, y, su, sv). It's already normalized.
This will generate a uniform random rotation because 4D spheres, unit quaternions and 3D rotations have equivalent measures.
The algorithm uses one square root, one division, and 16/π ≈ 5.09 random numbers on average. C++ code:
Quaternion random_quaternion() {
double x,y,z, u,v,w, s;
do { x = random(-1,1); y = random(-1,1); z = x*x + y*y; } while (z > 1);
do { u = random(-1,1); v = random(-1,1); w = u*u + v*v; } while (w > 1);
s = sqrt((1-z) / w);
return Quaternion(x, y, s*u, s*v);
}
Simplest way to generate it, just generate 4 random float and normalize it if required. If you want to produce rotation matrices later , than normalization can be skipped and convertion procedure should note nonunit quaternions.

Circle estimation from 2D data set

I am doing some computer vision based hand gesture recognising stuff. Here, I want to detect a circle (a circular motion) made by my hand. My initial stages are working fine and I am able to get a blob whose centroid from each frame I am plotting. This is essentially my data set. A collection of 2D co-ordinate points. Now I want to detect a circular type motion and say generate a call to a function which says "Circle Detected". The circle detector will give a YES / NO boolean output.
Here is a sample of the data set I am generating in 40 frames
The x, y values are just plotted to a bitmap image using MATLAB.
My initial hand movement was slow and later I picked up speed to complete the circle within stipulated time (40 frames). There is no hard and fast rule about the number of frames thing but for now I am using a 40 frame sliding window for circle detection (0-39) then (1-40) then (2-41) etc.
I am also calculating the arc-tangent between successive points using:
angle = atan2(prev_y - y, prev_x - x) * 180 / pi;
Now what approach should I take for detecting a circle (This sample image should result in a YES). The angle as I am noticing is not steadily increasing from 0 to 360. It does increase but with jumps here and there.
If you are only interested in full or nearly full circles:
I think that the standard parameter estimation approach: Hough/RANSAC won't work very well in this case.
Since you have frames order and therefore distances between consecutive blob centers, you can create a nearly uniform sub sample of the data (let say, pick 20 points spaced ~evenly), calculate the center and measure the distance of all points from that center.
If it is nearly a circle all points will have similar distance from the center.
If you want to do something slightly more robust, you can:
Compute center (mean) of all points.
Perform gradient descent to update the center: should be fairly easy an you won't have local minima. The error term I would probably use is max(D) - min(D) where D is the vector of distances between the blob centers and estimated circle center (but you can use robust statistics instead of max & min)
Evaluate the circle
I would use a Least Square estimation. Numerically you can use the Nelder-Mead method. You get the circle that best approximate your points and on the basis of the residual error value you decide whether to consider the circle valid or not.
Being points the array of the points, xc, yc the coordinates of the center and r the radius, this could be an example of error to minimize:
class Circle
{
private PointF[] _points;
public Circle(PointF[] points)
{
_points = points;
}
public double MinimizeFunction(double xc, double yc, double r)
{
double d, d2, dx, dy, sum;
sum = 0;
foreach(PointF p in _points)
{
dx = p.X - xc;
dy = p.Y - yc;
d2 = dx * dx + dy * dy;
// sum += d2 - r * r;
d = Math.Sqrt(d2) - r;
sum += d * d;
}
return sum;
}
public double ResidualError(double xc, double yc, double r)
{
return Math.Sqrt(MinimizeFunctional(xc, yc, r)) / (_points.Length - 3);
}
}
There is a slight difference between the commented functional and the uncommented, but for practical reason this difference is meaningless. Instead, from a theoretical point of view the difference is important.
Since you need to supply a initial values set (xc, yc, r), you can calculate the circle given three points, choosing three points far from each other.
If you need more details on "circle given three points" or Nelder-Mead you can google or ask me here.

Resources