I have been studying the algorithms of card.io and I have some difficulties when reading the Hough transform script.
For Hough transform, it is required to setup the accumulator, a grid that stores the vote in (rho, theta) space.
In card.io-dmz/cv/hough.cpp (https://github.com/card-io/card.io-dmz/blob/master/cv/hough.cpp#L99) line 99, It saids the number of rho numrho is given by
numrho = cvRound(((width + height) * 2 + 1) / rho);
Here width and height are the dimension of the ROI and rho is the distance resolution.
Question: I do not understand why the numerator is (width + height) * 2 + 1.
My guess is that + 1 is to count the zero value, and * 2 is to count both +ve rho and -ve rho.
But I still don't understand why width + height appears. I think it is more intuitive to replace it by sqrt(width*width + height*height), which is the largest possible value in rho.
This setting is also used in OpenCV (see this link: https://github.com/opencv/opencv/blob/master/modules/imgproc/src/hough.cpp#L128)
Any help would be appreciated. Thanks
Related
A similar question has been asked here. However I could not understand it clearly.
I understand that SIFT computation has the following steps:
Finding scale space extrema
Keypoint localization(and filtering)
Orientation assignment (using computation of gradient magnitude and orientation)
Create SIFT descriptor
My question is for the fourth step: How to set the region over which the SIFT descriptor is computed? Also how is the shape of the region for SIFT computation determined?
Suppose the scale space extrema was found at scale "s" in the second octave. I use the gradient orientation to align to a canonical orientation. How do I set the region of computation of the SIFT descriptor using these information? Do I use the scale or the magnitude of the gradient to find the region on which SIFT is to be computed? Also how is the shape of the region determined?
So this was surprisingly tricky to find an answer for.
David Lowe's original paper only seemed to provide vague theoretical explanation on how his algorithm worked.
And as far as I know, his official implementation never had its feature descriptor code open-sourced.
So I'm basing my answer off what I consider the next-most canonical implementation of the SIFT algorithm, being Rob Hess' OpenSIFT implementation;
which became the base for OpenCV's official implementation.
Anyway, here is my understanding of how SIFT roughly works:
Once you have located your extrema, you should know which octave & interval of the Gaussian Pyramid the extrema belongs to.
Based on Rob's code (these two functions on lines 1026-1112), the feature descriptor is calculated from the blurred image of that octave & interval.
And the region for calculating SIFT is a square shape surrounding the keypoint. This medium article also seems to agree (see illustration).
The SIFT formula for the Gaussian Kernel scale, relative to the original image size is (reference):
base_scale * 2^(octave + interval / intervals_per_octave)
Or this formula if working relative to the halved image in each octave:
base_scale * 2^(interval / intervals_per_octave)
Where the original paper defined the parameters through experiments as:
base_scale = 1.6 and intervals_per_octave = 3
So if your SIFT was set to have 3 intervals per octave, with a base Gaussian scale of 1.6, and the extrema was found on octave 2, interval 3;
the image will have been blurred by a Gaussian Kernel of scale : 1.6 * 2^(2 + 3/3) = 12.80 pixels
Now the actual array size of the Gaussian kernel will depend on the code you use, as the scale and the kernel size can be set independently.
In cases like MATLAB, I've found a helpful guidelines from this SO thread.
The selected answer recommends kernel width of 6 times the scale (i.e. 3 sigma rule), our kernel width (and height) is 12.80 * 6 ≈ 77 pixels;
thus, a SIFT descriptor region of size 77x77 pixels.
Meanwhile, the OpenCV implementation appears to leave the size of the kernel to be determined by OpenCV's own built-in Gaussian Blur function.
Line 246 from OpenCV's code leaves the Gaussian Blur function parameter ksize as zeroes,
which the official docs only states the kernel size will be "computed from sigma", and never defines how it is actually calculated...
Finally, for Rob's implementation, I have to admit that I couldn't quite understand what was happening in this final step. ¯\_(ツ)_/¯
From lines 1026-1112 Rob defined the code below, which shows show how he calculates the orientation histogram for the SIFT descriptor.
The code shows he defined a radius and used the nested for-loops with i and j to iterate through the square region around the keypoint, located at point (r,c).
Yet what I don't really understand is:
How he defined radius, with the Gaussian scale scl multiplied with some unknown constant SIFT_DESCR_SCL_FCTR = 3.0
As well as hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5, where d = SIFT_DESCR_WIDTH = 4
hist_width = SIFT_DESCR_SCL_FCTR * scl;
radius = hist_width * sqrt(2) * ( d + 1.0 ) * 0.5 + 0.5;
for( i = -radius; i <= radius; i++ )
for( j = -radius; j <= radius; j++ )
{
/*
Calculate sample's histogram array coords rotated relative to ori.
Subtract 0.5 so samples that fall e.g. in the center of row 1 (i.e.
r_rot = 1.5) have full weight placed in row 1 after interpolation.
*/
c_rot = ( j * cos_t - i * sin_t ) / hist_width;
r_rot = ( j * sin_t + i * cos_t ) / hist_width;
rbin = r_rot + d / 2 - 0.5;
cbin = c_rot + d / 2 - 0.5;
if( rbin > -1.0 && rbin < d && cbin > -1.0 && cbin < d )
if( calc_grad_mag_ori( img, r + i, c + j, &grad_mag, &grad_ori ))
{
grad_ori -= ori;
while( grad_ori < 0.0 )
grad_ori += PI2;
while( grad_ori >= PI2 )
grad_ori -= PI2;
obin = grad_ori * bins_per_rad;
w = exp( -(c_rot * c_rot + r_rot * r_rot) / exp_denom );
interp_hist_entry( hist, rbin, cbin, obin, grad_mag * w, d, n );
}
}
But regardless of how the exact size of the region is calculated, I think the general concept is the same.
To calculate the region size based on the original Gaussian scale.
Besides, given that the features are supposed to be "weighted by a Gaussian window" (original paper, section 6.1, page 15);
as long as the region you define is large enough to contain most of the meaningful orientation histograms, you are fine.
In summary:
The SIFT descriptor is calculated from the halved & blurred image of the same octave/interval as the keypoint (OpenSIFT)
The region for the SIFT descriptor is a square shape surrounding the keypoint (medium)(image)
The region size is calculated based on the Gaussian kernel scale, though the exact method for calculation can vary an easy rule of thumb is "width of 6 times the kernel scale" (thread)
I'm working through a mobilenetv2ssdlite Pytorch model and am confused about the output of the regression headers of the SSD. To my understanding, each bounding box has 4 outputs regarding the location of the box, deltaCx, deltaCy, deltaX, deltaY, with respect to a default bounding box/prior/anchor. However, in the model, it converts the regressional location results of the SSD into a form relative to the size by doing the following calculations:
$$predicted\_center * center_variance = \frac {real\_center - prior\_center} {prior\_hw}$$
$$exp(predicted\_hw * size_variance) = \frac {real\_hw} {prior\_hw}$$
Why is there a division by prior\hw for the centers if the output of the SSD is the offset of the centre of the predicted bounding box relative to the anchor? Why is it not predicted\_center * center_variance = real\_center - prior\_center? Is it actually the case that the output of the SSD is the offset with respect to the anchor's height/width? I've included the entire function for reference:
def convert_locations_to_boxes(locations, priors, center_variance,
size_variance):
"""Convert regressional location results of SSD into boxes in the form of (center_x, center_y, h, w).
The conversion:
$$predicted\_center * center_variance = \frac {real\_center - prior\_center} {prior\_hw}$$
$$exp(predicted\_hw * size_variance) = \frac {real\_hw} {prior\_hw}$$
We do it in the inverse direction here.
Args:
locations (batch_size, num_priors, 4): the regression output of SSD. It will contain the outputs as well.
priors (num_priors, 4) or (batch_size/1, num_priors, 4): prior boxes.
center_variance: a float used to change the scale of center.
size_variance: a float used to change of scale of size.
Returns:
boxes: priors: [[center_x, center_y, h, w]]. All the values
are relative to the image size.
"""
# priors can have one dimension less.
if priors.dim() + 1 == locations.dim():
priors = priors.unsqueeze(0)
return torch.cat([
locations[..., :2] * center_variance * priors[..., 2:] + priors[..., :2],
torch.exp(locations[..., 2:] * size_variance) * priors[..., 2:]
], dim=locations.dim() - 1)
Thanks
I had a question about tracking a spot on a turning circle. As you see in the image I am trying to calculate the x2 and only known parameters are θ1, L and x1. The challenge is to track that spot on each turn of circle which each step size is θ1.
The calculation which gives approximately correct answer is:
x2 = x1 - (L/2 - L/2 * cos(θ1))
Spot Tracking
The problem is as the circle turns x1 deviates more from the correct answer. Is there anyway to calculate θ2 as circle turns?
Hint:
The spot motion is described by
X = Xc + r cos Θ
Y = Yc + r sin Θ
Hence the angle seen from the origin,
φ = arctan((Yc + r sin Θ)/(Xc + r cos Θ)).
Notice that your problem is indeterminate, as the center of the circle is free to move at distance L of the origin, giving different intersections with the vertical at x1.
The Project Tango C API documentation says that the TANGO_CALIBRATION_POLYNOMIAL_3_PARAMETERS lens distortion is modeled as:
x_corr_px = x_px (1 + k1 * r2 + k2 * r4 + k3 * r6) y_corr_px = y_px (1
+ k1 * r2 + k2 * r4 + k3 * r6)
That is, the undistorted coordinates are a power series function of the distorted coordinates. There is another definition in the Java API, but that description isn't detailed enough to tell which direction the function maps.
I've had a lot of trouble getting things to register properly, and I suspect that the mapping may actually go in the opposite direction, i.e. the distorted coordinates are a power series of the undistorted coordinates. If the camera calibration was produced using OpenCV, then the cause of the problem may be that the OpenCV documentation contradicts itself. The easiest description to find and understand is the OpenCV camera calibration tutorial, which does agree with the Project Tango docs:
But on the other hand, the OpenCV API documentation specifies that the mapping goes the other way:
My experiments with OpenCV show that its API documentation appears correct and the tutorial is wrong. A positive k1 (with all other distortion parameters set to zero) means pincushion distortion, and a negative k1 means barrel distortion. This matches what Wikipedia says about the Brown-Conrady model and will be opposite from the Tsai model. Note that distortion can be modeled either way depending on what makes the math more convenient. I opened a bug against OpenCV for this mismatch.
So my question: Is the Project Tango lens distortion model the same as the one implemented in OpenCV (documentation notwithstanding)?
Here's an image I captured from the color camera (slight pincushioning is visible):
And here's the camera calibration reported by the Tango service:
distortion = {double[5]#3402}
[0] = 0.23019999265670776
[1] = -0.6723999977111816
[2] = 0.6520439982414246
[3] = 0.0
[4] = 0.0
calibrationType = 3
cx = 638.603
cy = 354.906
fx = 1043.08
fy = 1043.1
cameraId = 0
height = 720
width = 1280
Here's how to undistort with OpenCV in python:
>>> import cv2
>>> src = cv2.imread('tango00042.png')
>>> d = numpy.array([0.2302, -0.6724, 0, 0, 0.652044])
>>> m = numpy.array([[1043.08, 0, 638.603], [0, 1043.1, 354.906], [0, 0, 1]])
>>> h,w = src.shape[:2]
>>> mDst, roi = cv2.getOptimalNewCameraMatrix(m, d, (w,h), 1, (w,h))
>>> dst = cv2.undistort(src, m, d, None, mDst)
>>> cv2.imwrite('foo.png', dst)
And that produces this, which is maybe a bit overcorrected at the top edge but much better than my attempts with the reverse model:
The Tango C-API Docs state that (x_corr_px, y_corr_px) is the "corrected output position". This corrected output position needs to then be scaled by focal length and offset by center of projection to correspond to a distorted pixel coordinates.
So, to project a point onto an image, you would have to:
Transform the 3D point so that it is in the frame of the camera
Convert the point into normalized image coordinates (x, y)
Calculate r2, r4, r6 for the normalized image coordinates (r2 = x*x + y*y)
Compute (x_corr_px, y_corr_px) based on the mentioned equations:
x_corr_px = x (1 + k1 * r2 + k2 * r4 + k3 * r6)
y_corr_px = y (1 + k1 * r2 + k2 * r4 + k3 * r6)
Compute distorted coordinates
x_dist_px = x_corr_px * fx + cx
y_dist_px = y_corr_px * fy + cy
Draw (x_dist_px, y_dist_px) on the original, distorted image buffer.
This also means that the corrected coordinates are the normalized coordinates scaled by a power series of the normalized image coordinates' magnitude. (this is the opposite of what the question suggests)
Looking at the implementation of cvProjectPoints2 in OpenCV (see [opencv]/modules/calib3d/src/calibration.cpp), the "Poly3" distortion in OpenCV is being applied the same direction as in Tango. All 3 versions (Tango Docs, OpenCV Tutorials, OpenCV API) are consistent and correct.
Good luck, and hopefully this helps!
(Update: Taking a closer look at a the code, it looks like the corrected coordinates and distorted coordinates are not the same. I've removed the incorrect parts of my response, and the remaining parts of this answer are still correct.)
Maybe it's not the right place to post, but I really want to share the readable version of code used in OpenCV to actually correct the distortion.
I'm sure that I'm not the only one who needs x_corrected and y_corrected and fails to find an easy and understandable formula.
I've rewritten the essential part of cv2.undistortPoints in Python and you may notice that the correction is performed iteratively. This is important, because the solution for polynom of 9-th power does not exist and all we can do is to apply its the reveresed version several times to get the numerical solution.
def myUndistortPoint((x0, y0), CM, DC):
[[k1, k2, p1, p2, k3, k4, k5, k6]] = DC
fx, _, cx = CM[0]
_, fy, cy = CM[1]
x = x_src = (x0 - cx) / fx
y = y_src = (y0 - cy) / fy
for _ in range(5):
r2 = x**2 + y**2
r4 = r2**2
r6 = r2 * r4
rad_dist = (1 + k4*r2 + k5*r4 + k6*r6) / (1 + k1*r2 + k2*r4 + k3*r6)
tang_dist_x = 2*p1 * x*y + p2*(r2 + 2*x**2)
tang_dist_y = 2*p2 * x*y + p1*(r2 + 2*y**2)
x = (x_src - tang_dist_x) * rad_dist
y = (y_src - tang_dist_y) * rad_dist
x = x * fx + cx
y = y * fy + cy
return x, y
To speed up, you can use only three iterations, on most cameras this will give enough precision to fit the pixels.
A one to one point matching has already been established
between the blue dots on the two images.
The image2 is the distorted version of the image1. The distortion model seems to be
eyefish lens distortion. The question is:
Is there any way to compute a transformation matrix which describes this transition.
In fact a matrix which transforms the blue
dots on the first image to their corresponding blue dots on the second image?
The problem here is that we don’t know the focal length(means images are uncalibrated), however we do have
perfect matching between around 200 points on the two images.
the distorted image:
I think what you're trying to do can be treated as a distortion correction problem, without the need of the rest of a classic camera calibration.
A matrix transformation is a linear one and linear transformations map always straight lines into straight lines (http://en.wikipedia.org/wiki/Linear_map). It is apparent from the picture that the transformation is nonlinear so you cannot describe it with a matrix operation.
That said, you can use a lens distortion model like the one used by OpenCV (http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html) and obtaining the coefficients shouldn't be very difficult. Here is what you can do in Matlab:
Call (x, y) the coordinates of an original point (top picture) and (xp, yp) the coordinates of a distorted point (bottom picture), both shifted to the center of the image and divided by a scaling factor (same for x and y) so they lie more or less in the [-1, 1] interval. The distortion model is:
x = ( xp*(1 + k1*r^2 + k2*r^4 + k3*r^6) + 2*p1*xp*yp + p2*(r^2 + 2*xp^2));
y = ( yp*(1 + k1*r^2 + k2*r^4 + k3*r^6) + 2*p2*xp*yp + p1*(r^2 + 2*yp^2));
Where
r = sqrt(x^2 + y^2);
You have 5 parameters: k1, k2, k3, p1, p2 for radial and tangential distortion and 200 pairs of points, so you can solve the nonlinear system.
Be sure the x, y, xp and yp arrays exist in the workspace and declare them global:
global x y xp yp
Write a function to evaluate the mean square error given a set of arbitrary distortion coefficients, say it's called 'dist':
function val = dist(var)
global x y xp yp
val = zeros(size(xp));
k1 = var(1);
k2 = var(2);
k3 = var(3);
p1 = var(4);
p2 = var(5);
r = sqrt(xp.*xp + yp.*yp);
temp1 = x - ( xp.*(1 + k1*r.^2 + k2*r.^4 + k3*r.^6) + 2*p1*xp.*yp + p2*(r.^2 + 2*xp.^2));
temp2 = y - ( yp.*(1 + k1*r.^2 + k2*r.^4 + k3*r.^6) + 2*p2*xp.*yp + p1*(r.^2 + 2*yp.^2));
val = sqrt(temp1.*temp1 + temp2.*temp2);
Solve the system with 'fsolve":
[coef, fval] = fsolve(#dist, zeros(5,1));
The values in 'coef' are the distortion coefficients you're looking for. To correct the distortion of new points (xp, yp) not present in the original set, use the equations:
r = sqrt(xp.*xp + yp.*yp);
x_corr = xp.*(1 + k1*r.^2 + k2*r.^4 + k3*r.^6) + 2*p1*xp.*yp + p2*(r.^2 + 2*xp.^2);
y_corr = yp.*(1 + k1*r.^2 + k2*r.^4 + k3*r.^6) + 2*p2*xp.*yp + p1*(r.^2 + 2*yp.^2);
Results will be shifted to the center of the image and scaled by the factor you used above.
Notes:
Coordinates must be shifted to the center of the image as the distortion is symmetric with respect to it.
It should't be necessary to normalize to the interval [-1, 1] but it is comon to do it so the distortion coefficients obtained are more or less of the same order of magnitude (working with powers 2, 4 and 6 of pixel coordinates would need very small coefficients).
This method doesn't require the points in the image to be in an uniform grid.