Our AR device is based on a camera with pretty strong optical zoom. We measure the distortion of this camera using classical camera-calibration tools (checkerboards), both through OpenCV and the GML Camera Calibration tools.
At higher zoom levels (I'll use 249 out of 255 as an example) we measure the following camera parameters at full HD resolution (1920x1080):
fx = 24545.4316
fy = 24628.5469
cx = 924.3162
cy = 440.2694
For the radial and tangential distortion we measured 4 values:
k1 = 5.423406
k2 = -2964.24243
p1 = 0.004201721
p2 = 0.0162647516
We are not sure how to interpret (read: implement) those extremely large values for k1 and k2. Using OpenCV's classic "undistort" operation to rectify the image using these values seems to work well. Unfortunately this is (much) too slow for realtime usage.
The thumbnails below look similar, clicking them will display the full size images where you can spot the difference:
Camera footage
Undistorted using OpenCV
That's why we want to take the opposite aproach: leave the camera footage be distorted and apply a similar distortion to our 3D scene using shaders. Following the OpenCV documentation and this accepted answer in particular, the distorted position for a corner point (0, 0) would be
// To relative coordinates
double x = (point.X - cx) / fx; // -960 / 24545 = -0.03911
double y = (point.Y - cy) / fy; // -540 / 24628 = -0.02193
double r2 = x*x + y*y; // 0.002010
// Radial distortion
// -0.03911 * (1 + 5.423406 * 0.002010 + -2964.24243 * 0.002010 * 0.002010) = -0.039067
double xDistort = x * (1 + k1 * r2 + k2 * r2 * r2);
// -0.02193 * (1 + 5.423406 * 0.002010 + -2964.24243 * 0.002010 * 0.002010) = -0.021906
double yDistort = y * (1 + k1 * r2 + k2 * r2 * r2);
// Tangential distortion
... left out for brevity
// Back to absolute coordinates.
xDistort = xDistort * fx + cx; // -0.039067 * 24545.4316 + 924.3162 = -34.6002 !!!
yDistort = yDistort * fy + cy; // -0.021906 * 24628.5469 + 440.2694 = = -99.2435 !!!
These large pixel displacements (34 and 100 pixels at the upper left corner) seem overly warped and do not correspond with the undistorted image OpenCV generates.
So the specific question is: what is wrong with the way we interpreted the values we measured, and what should the correct code for distortion be?
I'm using in OpenCV the method
triangulatePoints(P1,P2,x1,x2)
to get the 3D coordinates of a point by its image points x1/x2 in the left/right image and the projection matrices P1/P2.
I've already studied epipolar geometry and know most of the maths behind it. But what how does this algorithm get mathematically the 3D Coordinates?
Here are just some ideas, to the best of my knowledge, should at least work theoretically.
Using the camera equation ax = PX, we can express the two image point correspondences as
ap = PX
bq = QX
where p = [p1 p2 1]' and q = [q1 q2 1]' are the matching image points to the 3D point X = [X Y Z 1]' and P and Q are the two projection matrices.
We can expand these two equations and rearrange the terms to form an Ax = b system as shown below
p11.X + p12.Y + p13.Z - a.p1 + b.0 = -p14
p21.X + p22.Y + p23.Z - a.p2 + b.0 = -p24
p31.X + p32.Y + p33.Z - a.1 + b.0 = -p34
q11.X + q12.Y + q13.Z + a.0 - b.q1 = -q14
q21.X + q22.Y + q23.Z + a.0 - b.q2 = -q24
q31.X + q32.Y + q33.Z + a.0 - b.1 = -q34
from which we get
A = [p11 p12 p13 -p1 0; p21 p22 p23 -p2 0; p31 p32 p33 -1 0; q11 q12 q13 0 -q1; q21 q22 q23 0 -q2; q31 q32 q33 0 -1], x = [X Y Z a b]' and b = -[p14 p24 p34 q14 q24 q34]'. Now we can solve for x to find the 3D coordinates.
Another approach is to use the fact, from camera equation ax = PX, that x and PX are parallel. Therefore, their cross product must be a 0 vector. So using,
p x PX = 0
q x QX = 0
we can construct a system of the form Ax = 0 and solve for x.
I am trying to calculate the original equation using a dft.
DFT on (1,0,0,0) gives (1,1,1,1)
So what is the equation of wave representing the dataset (1,0,0,0)? I mean something as follows.
f(t)=sin(t)+0.13sin(3t)
DFT of an impulse is just a flat spectrum in the frequency domain, so you have amplitude 1 at each frequency bin:
f(t) = 1 + cos(2πt/4) + cos(4πt/4) + cos(6πt/4)
= 1 + cos(πt/2) + cos(πt) + cos(3πt/2)
The Project Tango C API documentation says that the TANGO_CALIBRATION_POLYNOMIAL_3_PARAMETERS lens distortion is modeled as:
x_corr_px = x_px (1 + k1 * r2 + k2 * r4 + k3 * r6) y_corr_px = y_px (1
+ k1 * r2 + k2 * r4 + k3 * r6)
That is, the undistorted coordinates are a power series function of the distorted coordinates. There is another definition in the Java API, but that description isn't detailed enough to tell which direction the function maps.
I've had a lot of trouble getting things to register properly, and I suspect that the mapping may actually go in the opposite direction, i.e. the distorted coordinates are a power series of the undistorted coordinates. If the camera calibration was produced using OpenCV, then the cause of the problem may be that the OpenCV documentation contradicts itself. The easiest description to find and understand is the OpenCV camera calibration tutorial, which does agree with the Project Tango docs:
But on the other hand, the OpenCV API documentation specifies that the mapping goes the other way:
My experiments with OpenCV show that its API documentation appears correct and the tutorial is wrong. A positive k1 (with all other distortion parameters set to zero) means pincushion distortion, and a negative k1 means barrel distortion. This matches what Wikipedia says about the Brown-Conrady model and will be opposite from the Tsai model. Note that distortion can be modeled either way depending on what makes the math more convenient. I opened a bug against OpenCV for this mismatch.
So my question: Is the Project Tango lens distortion model the same as the one implemented in OpenCV (documentation notwithstanding)?
Here's an image I captured from the color camera (slight pincushioning is visible):
And here's the camera calibration reported by the Tango service:
distortion = {double[5]#3402}
[0] = 0.23019999265670776
[1] = -0.6723999977111816
[2] = 0.6520439982414246
[3] = 0.0
[4] = 0.0
calibrationType = 3
cx = 638.603
cy = 354.906
fx = 1043.08
fy = 1043.1
cameraId = 0
height = 720
width = 1280
Here's how to undistort with OpenCV in python:
>>> import cv2
>>> src = cv2.imread('tango00042.png')
>>> d = numpy.array([0.2302, -0.6724, 0, 0, 0.652044])
>>> m = numpy.array([[1043.08, 0, 638.603], [0, 1043.1, 354.906], [0, 0, 1]])
>>> h,w = src.shape[:2]
>>> mDst, roi = cv2.getOptimalNewCameraMatrix(m, d, (w,h), 1, (w,h))
>>> dst = cv2.undistort(src, m, d, None, mDst)
>>> cv2.imwrite('foo.png', dst)
And that produces this, which is maybe a bit overcorrected at the top edge but much better than my attempts with the reverse model:
The Tango C-API Docs state that (x_corr_px, y_corr_px) is the "corrected output position". This corrected output position needs to then be scaled by focal length and offset by center of projection to correspond to a distorted pixel coordinates.
So, to project a point onto an image, you would have to:
Transform the 3D point so that it is in the frame of the camera
Convert the point into normalized image coordinates (x, y)
Calculate r2, r4, r6 for the normalized image coordinates (r2 = x*x + y*y)
Compute (x_corr_px, y_corr_px) based on the mentioned equations:
x_corr_px = x (1 + k1 * r2 + k2 * r4 + k3 * r6)
y_corr_px = y (1 + k1 * r2 + k2 * r4 + k3 * r6)
Compute distorted coordinates
x_dist_px = x_corr_px * fx + cx
y_dist_px = y_corr_px * fy + cy
Draw (x_dist_px, y_dist_px) on the original, distorted image buffer.
This also means that the corrected coordinates are the normalized coordinates scaled by a power series of the normalized image coordinates' magnitude. (this is the opposite of what the question suggests)
Looking at the implementation of cvProjectPoints2 in OpenCV (see [opencv]/modules/calib3d/src/calibration.cpp), the "Poly3" distortion in OpenCV is being applied the same direction as in Tango. All 3 versions (Tango Docs, OpenCV Tutorials, OpenCV API) are consistent and correct.
Good luck, and hopefully this helps!
(Update: Taking a closer look at a the code, it looks like the corrected coordinates and distorted coordinates are not the same. I've removed the incorrect parts of my response, and the remaining parts of this answer are still correct.)
Maybe it's not the right place to post, but I really want to share the readable version of code used in OpenCV to actually correct the distortion.
I'm sure that I'm not the only one who needs x_corrected and y_corrected and fails to find an easy and understandable formula.
I've rewritten the essential part of cv2.undistortPoints in Python and you may notice that the correction is performed iteratively. This is important, because the solution for polynom of 9-th power does not exist and all we can do is to apply its the reveresed version several times to get the numerical solution.
def myUndistortPoint((x0, y0), CM, DC):
[[k1, k2, p1, p2, k3, k4, k5, k6]] = DC
fx, _, cx = CM[0]
_, fy, cy = CM[1]
x = x_src = (x0 - cx) / fx
y = y_src = (y0 - cy) / fy
for _ in range(5):
r2 = x**2 + y**2
r4 = r2**2
r6 = r2 * r4
rad_dist = (1 + k4*r2 + k5*r4 + k6*r6) / (1 + k1*r2 + k2*r4 + k3*r6)
tang_dist_x = 2*p1 * x*y + p2*(r2 + 2*x**2)
tang_dist_y = 2*p2 * x*y + p1*(r2 + 2*y**2)
x = (x_src - tang_dist_x) * rad_dist
y = (y_src - tang_dist_y) * rad_dist
x = x * fx + cx
y = y * fy + cy
return x, y
To speed up, you can use only three iterations, on most cameras this will give enough precision to fit the pixels.
I have found this way to calculate elongation basing on image moments
#ELONGATION
def elongation(m):
x = m['mu20'] + m['mu02']
y = 4 * m['mu11']**2 + (m['mu20'] - m['mu02'])**2
return (x + y**0.5) / (x - y**0.5)
mom = cv2.moments(unicocnt, 1)
elongation = elongation(mom)
How can i calculate elongation of a Convex Hull?
hull = cv2.convexHull(unicocnt)
where 'unicocnt' is a contour that was taken with find contours.
By default, convexHull output a vector of indices of points. You have to set the returnPoints argument to 1 to output a vector of points that you can then pass to cv2.moments.