SURF: How could we get the value of sigma from the keypoint radius - opencv

In the SURF technique, and more precisely within the feature description stage, the authors have stated (if I understand correctly) that the description will be performed in a area of 20 times sigma. Sigma represents the scale on which the keypoint was detected.
Sigma = 0.4 x L where L = 2^Octave x level+1. If we use the OpenCV implementation, the DetectAndCompute function computes, with the value of Keypoint.size, the radius of the circle surrounding the keypoint.
My question is : How could we get the value of sigma from the radius value ?

According to these lines:
KeyPoint& kp = (*keypoints)[k];
float size = kp.size;
Point2f center = kp.pt;
/* The sampling intervals and wavelet sized for selecting an orientation
and building the keypoint descriptor are defined relative to 's' */
float s = size*1.2f/9.0f;

This value s = size*1.2f/9.0f is not montioned in the bay's article scale= L*0.4 or
scale= L* 1.2/3 any one can explain me this part??

Related

How to set the region (and its shape) over which the SIFT descriptor is computed?

A similar question has been asked here. However I could not understand it clearly.
I understand that SIFT computation has the following steps:
Finding scale space extrema
Keypoint localization(and filtering)
Orientation assignment (using computation of gradient magnitude and orientation)
Create SIFT descriptor
My question is for the fourth step: How to set the region over which the SIFT descriptor is computed? Also how is the shape of the region for SIFT computation determined?
Suppose the scale space extrema was found at scale "s" in the second octave. I use the gradient orientation to align to a canonical orientation. How do I set the region of computation of the SIFT descriptor using these information? Do I use the scale or the magnitude of the gradient to find the region on which SIFT is to be computed? Also how is the shape of the region determined?
So this was surprisingly tricky to find an answer for.
David Lowe's original paper only seemed to provide vague theoretical explanation on how his algorithm worked.
And as far as I know, his official implementation never had its feature descriptor code open-sourced.
So I'm basing my answer off what I consider the next-most canonical implementation of the SIFT algorithm, being Rob Hess' OpenSIFT implementation;
which became the base for OpenCV's official implementation.
Anyway, here is my understanding of how SIFT roughly works:
Once you have located your extrema, you should know which octave & interval of the Gaussian Pyramid the extrema belongs to.
Based on Rob's code (these two functions on lines 1026-1112), the feature descriptor is calculated from the blurred image of that octave & interval.
And the region for calculating SIFT is a square shape surrounding the keypoint. This medium article also seems to agree (see illustration).
The SIFT formula for the Gaussian Kernel scale, relative to the original image size is (reference):
base_scale * 2^(octave + interval / intervals_per_octave)
Or this formula if working relative to the halved image in each octave:
base_scale * 2^(interval / intervals_per_octave)
Where the original paper defined the parameters through experiments as:
base_scale = 1.6 and intervals_per_octave = 3
So if your SIFT was set to have 3 intervals per octave, with a base Gaussian scale of 1.6, and the extrema was found on octave 2, interval 3;
the image will have been blurred by a Gaussian Kernel of scale : 1.6 * 2^(2 + 3/3) = 12.80 pixels
Now the actual array size of the Gaussian kernel will depend on the code you use, as the scale and the kernel size can be set independently.
In cases like MATLAB, I've found a helpful guidelines from this SO thread.
The selected answer recommends kernel width of 6 times the scale (i.e. 3 sigma rule), our kernel width (and height) is 12.80 * 6 ≈ 77 pixels;
thus, a SIFT descriptor region of size 77x77 pixels.
Meanwhile, the OpenCV implementation appears to leave the size of the kernel to be determined by OpenCV's own built-in Gaussian Blur function.
Line 246 from OpenCV's code leaves the Gaussian Blur function parameter ksize as zeroes,
which the official docs only states the kernel size will be "computed from sigma", and never defines how it is actually calculated...
Finally, for Rob's implementation, I have to admit that I couldn't quite understand what was happening in this final step. ¯\_(ツ)_/¯
From lines 1026-1112 Rob defined the code below, which shows show how he calculates the orientation histogram for the SIFT descriptor.
The code shows he defined a radius and used the nested for-loops with i and j to iterate through the square region around the keypoint, located at point (r,c).
Yet what I don't really understand is:
How he defined radius, with the Gaussian scale scl multiplied with some unknown constant SIFT_DESCR_SCL_FCTR = 3.0
As well as hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5, where d = SIFT_DESCR_WIDTH = 4
hist_width = SIFT_DESCR_SCL_FCTR * scl;
radius = hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5;
for( i = -radius; i <= radius; i++ )
for( j = -radius; j <= radius; j++ )
{
/*
Calculate sample's histogram array coords rotated relative to ori.
Subtract 0.5 so samples that fall e.g. in the center of row 1 (i.e.
r_rot = 1.5) have full weight placed in row 1 after interpolation.
*/
c_rot = ( j * cos_t - i * sin_t ) / hist_width;
r_rot = ( j * sin_t + i * cos_t ) / hist_width;
rbin = r_rot + d / 2 - 0.5;
cbin = c_rot + d / 2 - 0.5;
if( rbin > -1.0 && rbin < d && cbin > -1.0 && cbin < d )
if( calc_grad_mag_ori( img, r + i, c + j, &grad_mag, &grad_ori ))
{
grad_ori -= ori;
while( grad_ori < 0.0 )
grad_ori += PI2;
while( grad_ori >= PI2 )
grad_ori -= PI2;
obin = grad_ori * bins_per_rad;
w = exp( -(c_rot * c_rot + r_rot * r_rot) / exp_denom );
interp_hist_entry( hist, rbin, cbin, obin, grad_mag * w, d, n );
}
}
But regardless of how the exact size of the region is calculated, I think the general concept is the same.
To calculate the region size based on the original Gaussian scale.
Besides, given that the features are supposed to be "weighted by a Gaussian window" (original paper, section 6.1, page 15);
as long as the region you define is large enough to contain most of the meaningful orientation histograms, you are fine.
In summary:
The SIFT descriptor is calculated from the halved & blurred image of the same octave/interval as the keypoint (OpenSIFT)
The region for the SIFT descriptor is a square shape surrounding the keypoint (medium)(image)
The region size is calculated based on the Gaussian kernel scale, though the exact method for calculation can vary an easy rule of thumb is "width of 6 times the kernel scale" (thread)

How Convexity Defect is calculated in OpenCV?

What is the algorithm used in OpenCV function convexityDefects() to calculate the convexity defects of a contour?
Please, describe and illustrate the high-level operation of the algorithm, along with its inputs and outputs.
Based on the documentation, the input are two lists of coordinates:
contour defining the original contour (red on the image below)
convexhull defining the convex hull corresponding to that contour (blue on the image below)
The algorithm works in the following manner:
If the contour or the hull contain 3 or less points, then the contour is always convex, and no more processing is needed. The algorithm assures that both the contour and the hull are accessed in the same orientation.
N.B.: In further explanation I assume they are in the same orientation, and ignore the details regarding representation of the floating point depth as an integer.
Then for each pair of adjacent hull points (H[i], H[i+1]), defining one edge of the convex hull, calculate the distance from the edge for each point on the contour C[n] that lies between H[i] and H[i+1] (excluding C[n] == H[i+1]). If the distance is greater than zero, then a defect is present. When a defect is present, record i, i+1, the maximum distance and the index (n) of the contour point where the maximum located.
Distance is calculated in the following manner:
dx0 = H[i+1].x - H[i].x
dy0 = H[i+1].y - H[i].y
if (dx0 is 0) and (dy0 is 0) then
scale = 0
else
scale = 1 / sqrt(dx0 * dx0 + dy0 * dy0)
dx = C[n].x - H[i].x
dy = C[n].y - H[i].y
distance = abs(-dy0 * dx + dx0 * dy) * scale
It may be easier to visualize in terms of vectors:
C: defect vector from H[i] to C[n]
H: hull edge vector from H[i] to H[i+1]
H_rot: hull edge vector H rotated 90 degrees
U_rot: unit vector in direction of H_rot
H components are [dx0, dy0], so rotating 90 degrees gives [-dy0, dx0].
scale is used to find U_rot from H_rot, but because divisions are more computationally expensive than multiplications, the inverse is used as an optimization. It's also pre-calculated before the loop over C[n] to avoid recomputing each iteration.
|H| = sqrt(dx0 * dx0 + dy0 * dy0)
U_rot = H_rot / |H| = H_rot * scale
Then, a dot product between C and U_rot gives the perpendicular distance from the defect point to the hull edge, and abs() is used to get a positive magnitude in any orientation.
distance = abs(U_rot.C) = abs(-dy0 * dx + dx0 * dy) * scale
In the scenario depicted on the above image, in first iteration, the edge is defined by H[0] and H[1]. The contour points tho examine for this edge are C[0], C[1], and C[2] (since C[3] == H[1]).
There are defects at C[1] and C[2]. The defect at C[1] is the deepest, so the algorithm will record (0, 1, 1, 50).
The next edge is defined by H[1] and H[2], and corresponding contour point C[3]. No defect is present, so nothing is recorded.
The next edge is defined by H[2] and H[3], and corresponding contour point C[4]. No defect is present, so nothing is recorded.
Since C[5] == H[3], the last contour point can be ignored -- there can't be a defect there.

How do you calculate the average gradient direction and average gradient strength/magnitude

In OpenCV how do you calculate the average gradient strength in a Mat and the average gradient direction?
I have sourced the below methods by googling but I want to confirm I am actually doing this correctly before moving onto the next step.
Is this correct?
Mat img = imread('foo.png', CV_8UC); // read image as grayscale single channel
// Calculate the mean intensity and the std deviation
// Any errors here or am I doing this correctly?
Scalar sMean, sStdDev;
meanStdDev(src, sMean, sStdDev);
double mean = sMean[0];
double stddev = sStdDev[0];
// Calculate the average gradient magnitude/strength across the image
// Any errors here or am I doing this correctly?
Mat dX, dY, magnitude;
Sobel(src, dX, CV_32F, 1, 0, 1);
Sobel(src, dY, CV_32F, 0, 1, 1);
magnitude(dX, dY, magnitude);
Scalar sMMean, sMStdDev;
meanStdDev(magnitude, sMMean, sMStdDev);
double magnitudeMean = sMMean[0];
double magnitudeStdDev = sMStdDev[0];
// Calculate the average gradient direction across the image
// Any errors here or am I doing this correctly?
Scalar avgHorizDir = mean(dX);
Scalar avgVertDir = mean(dY);
double avgDir = atan2(-avgVertDir[0], avgHorizDir[0]);
float blurriness = cv::videostab::calcBlurriness(src); // low values = sharper. High values = blurry
Technically those are the correct ways of obtaining the two averages.
The way you compute mean direction uses weighted directional statistics, meaning that pixels without a strong gradient have less influence on the average.
However, for most images this average direction is not very meaningful, as there exist edges in all directions and cancel out.
If your image is of a single edge, then this will work great.
If your image has lines in it, containing edges in opposite directions, this will not work. In this case, you want to average the double angle (average orientations). The obvious way of doing this is to compute the direction per pixel as an angle, double them, then use directional statistics to average (ie convert back to vectors and average those). Doubling the angle causes opposite directions to be mapped to the same value, thus averaging doesn’t cancel these out.
Another simple way to average orientations is to take the average of the tensor field obtained by the outer product of the gradient field with itself, and determine the direction of the eigenvector corresponding to the largest eigenvalue. The tensor field is obtained as follows:
Mat Sxx = dX * dX;
Mat Syy = dY * dY;
Mat Sxy = dX * dY;
This should then be averaged:
Scalar mSxx = mean(sXX);
Scalar mSyy = mean(sYY);
Scalar mSxy = mean(sXY);
These values form a 2x2 real-valued symmetric matrix:
| mSxx mSxy |
| mSxy mSyy |
It is relatively straight-forward to determine its eigendecomposition, and can be done analytically. I don’t have the equations on hand right now, so I’ll leave it as an exercise to the reader. :)

How to calculate quantized angle?

I am looking at the code for Hough transformation in image segmentation. The following code is from Computer Vision by Linda Shapiro. Can somebody tell me what is quantize_angle and how can I compute it?
The Hough transform looks for straight lines (or other features) in an image and represents these features as points in a different 2D coordinate system, where one axis represents the angle θ of a detected line, and the other represents the distance δ from this line to the centre of the image.
Source: Wikipedia
To produce a Hough transform of finite dimensions, both θ and δ have to be quantized. For example, if θ lies in the range (0 ≤ θ < 2π), then you could map it to the range 0–255 by a function such as the following:
int quantize_angle(float theta) {
int q = floor(theta * 128.0 / 3.141592654 + 0.5);
return q % 256;
}
This will result in a Hough transform that is 256 pixels wide.

Laplacian of gaussian filter use

This is a formula for LoG filtering:
(source: ed.ac.uk)
Also in applications with LoG filtering I see that function is called with only one parameter:
sigma(σ).
I want to try LoG filtering using that formula (previous attempt was by gaussian filter and then laplacian filter with some filter-window size )
But looking at that formula I can't understand how the size of filter is connected with this formula, does it mean that the filter size is fixed?
Can you explain how to use it?
As you've probably figured out by now from the other answers and links, LoG filter detects edges and lines in the image. What is still missing is an explanation of what σ is.
σ is the scale of the filter. Is a one-pixel-wide line a line or noise? Is a line 6 pixels wide a line or an object with two distinct parallel edges? Is a gradient that changes from black to white across 6 or 8 pixels an edge or just a gradient? It's something you have to decide, and the value of σ reflects your decision — the larger σ is the wider are the lines, the smoother the edges, and more noise is ignored.
Do not get confused between the scale of the filter (σ) and the size of the discrete approximation (usually called stencil). In Paul's link σ=1.4 and the stencil size is 9. While it is usually reasonable to use stencil size of 4σ to 6σ, these two quantities are quite independent. A larger stencil provides better approximation of the filter, but in most cases you don't need a very good approximation.
This was something that confused me too, and it wasn't until I had to do the same as you for a uni project that I understood what you were supposed to do with the formula!
You can use this formula to generate a discrete LoG filter. If you write a bit of code to implement that formula, you can then to generate a filter for use in image convolution. To generate, say a 5x5 template, simply call the code with x and y ranging from -2 to +2.
This will generate the values to use in a LoG template. If you graph the values this produces you should see the "mexican hat" shape typical of this filter, like so:
(source: ed.ac.uk)
You can fine tune the template by changing how wide it is (the size) and the sigma value (how broad the peak is). The wider and broader the template the less affected by noise the result will be because it will operate over a wider area.
Once you have the filter, you can apply it to the image by convolving the template with the image. If you've not done this before, check out these few tutorials.
java applet tutorials more mathsy.
Essentially, at each pixel location, you "place" your convolution template, centred at that pixel. You then multiply the surrounding pixel values by the corresponding "pixel" in the template and add up the result. This is then the new pixel value at that location (typically you also have to normalise (scale) the output to bring it back into the correct value range).
The code below gives a rough idea of how you might implement this. Please forgive any mistakes / typos etc. as it hasn't been tested.
I hope this helps.
private float LoG(float x, float y, float sigma)
{
// implement formula here
return (1 / (Math.PI * sigma*sigma*sigma*sigma)) * //etc etc - also, can't remember the code for "to the power of" off hand
}
private void GenerateTemplate(int templateSize, float sigma)
{
// Make sure it's an odd number for convenience
if(templateSize % 2 == 1)
{
// Create the data array
float[][] template = new float[templateSize][templatesize];
// Work out the "min and max" values. Log is centered around 0, 0
// so, for a size 5 template (say) we want to get the values from
// -2 to +2, ie: -2, -1, 0, +1, +2 and feed those into the formula.
int min = Math.Ceil(-templateSize / 2) - 1;
int max = Math.Floor(templateSize / 2) + 1;
// We also need a count to index into the data array...
int xCount = 0;
int yCount = 0;
for(int x = min; x <= max; ++x)
{
for(int y = min; y <= max; ++y)
{
// Get the LoG value for this (x,y) pair
template[xCount][yCount] = LoG(x, y, sigma);
++yCount;
}
++xCount;
}
}
}
Just for visualization purposes, here is a simple Matlab 3D colored plot of the Laplacian of Gaussian (Mexican Hat) wavelet. You can change the sigma(σ) parameter and see its effect on the shape of the graph:
sigmaSq = 0.5 % Square of σ parameter
[x y] = meshgrid(linspace(-3,3), linspace(-3,3));
z = (-1/(pi*(sigmaSq^2))) .* (1-((x.^2+y.^2)/(2*sigmaSq))) .*exp(-(x.^2+y.^2)/(2*sigmaSq));
surf(x,y,z)
You could also compare the effects of the sigma parameter on the Mexican Hat doing the following:
t = -5:0.01:5;
sigma = 0.5;
mexhat05 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 1;
mexhat1 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 2;
mexhat2 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
plot(t, mexhat05, 'r', ...
t, mexhat1, 'b', ...
t, mexhat2, 'g');
Or simply use the Wavelet toolbox provided by Matlab as follows:
lb = -5; ub = 5; n = 1000;
[psi,x] = mexihat(lb,ub,n);
plot(x,psi), title('Mexican hat wavelet')
I found this useful when implementing this for edge detection in computer vision. Although not the exact answer, hope this helps.
It appears to be a continuous circular filter whose radius is sqrt(2) * sigma. If you want to implement this for image processing you'll need to approximate it.
There's an example for sigma = 1.4 here: http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm

Resources