How can I calculate distance from camera to a point on a ground plane from an image?
I have the intrinsic parameters of the camera and the position (height, pitch).
Is there any OpenCV function that can estimate that distance?
You can use undistortPoints to compute the rays backprojecting the pixels, but that API is rather hard to use for your purpose. It may be easier to do the calculation "by hand" in your code. Doing it at least once will also help you understand what exactly that API is doing.
Express your "position (height, pitch)" of the camera as a rotation matrix R and a translation vector t, representing the coordinate transform from the origin of the ground plane to the camera. That is, given a point in ground plane coordinates Pg = [Xg, Yg, Zg], its coordinates in camera frame are given by
Pc = R * Pg + t
The camera center is Cc = [0, 0, 0] in camera coordinates. In ground coordinates it is then:
Cg = inv(R) * (-t) = -R' * t
where inv(R) is the inverse of R, R' is its transpose, and the last equality is due to R being an orthogonal matrix.
Let's assume, for simplicity, that the the ground plane is Zg = 0.
Let K be the matrix of intrinsic parameters. Given a pixel q = [u, v], write it in homogeneous image coordinates Q = [u, v, 1]. Its location in camera coordinates is
Qc = Ki * Q
where Ki = inv(K) is the inverse of the intrinsic parameters matrix. The same point in world coordinates is then
Qg = R' * Qc + Cg
All the points Pg = [Xg, Yg, Zg] that belong to the ray from the camera center through that pixel, expressed in ground coordinates, are then on the line
Pg = Cg + lambda * (Qg - Cg)
for lambda going from 0 to positive infinity. This last formula represents three equations in ground XYZ coordinates, and you want to find the values of X, Y, Z and lambda where the ray intersects the ground plane. But that means Zg=0, so you have only 3 unknowns. Solve them (you recover lambda from the 3rd equation, then substitute in the first two), and you get Xg and Yg of the solution to your problem.
Related
What is the algorithm used in OpenCV function convexityDefects() to calculate the convexity defects of a contour?
Please, describe and illustrate the high-level operation of the algorithm, along with its inputs and outputs.
Based on the documentation, the input are two lists of coordinates:
contour defining the original contour (red on the image below)
convexhull defining the convex hull corresponding to that contour (blue on the image below)
The algorithm works in the following manner:
If the contour or the hull contain 3 or less points, then the contour is always convex, and no more processing is needed. The algorithm assures that both the contour and the hull are accessed in the same orientation.
N.B.: In further explanation I assume they are in the same orientation, and ignore the details regarding representation of the floating point depth as an integer.
Then for each pair of adjacent hull points (H[i], H[i+1]), defining one edge of the convex hull, calculate the distance from the edge for each point on the contour C[n] that lies between H[i] and H[i+1] (excluding C[n] == H[i+1]). If the distance is greater than zero, then a defect is present. When a defect is present, record i, i+1, the maximum distance and the index (n) of the contour point where the maximum located.
Distance is calculated in the following manner:
dx0 = H[i+1].x - H[i].x
dy0 = H[i+1].y - H[i].y
if (dx0 is 0) and (dy0 is 0) then
scale = 0
else
scale = 1 / sqrt(dx0 * dx0 + dy0 * dy0)
dx = C[n].x - H[i].x
dy = C[n].y - H[i].y
distance = abs(-dy0 * dx + dx0 * dy) * scale
It may be easier to visualize in terms of vectors:
C: defect vector from H[i] to C[n]
H: hull edge vector from H[i] to H[i+1]
H_rot: hull edge vector H rotated 90 degrees
U_rot: unit vector in direction of H_rot
H components are [dx0, dy0], so rotating 90 degrees gives [-dy0, dx0].
scale is used to find U_rot from H_rot, but because divisions are more computationally expensive than multiplications, the inverse is used as an optimization. It's also pre-calculated before the loop over C[n] to avoid recomputing each iteration.
|H| = sqrt(dx0 * dx0 + dy0 * dy0)
U_rot = H_rot / |H| = H_rot * scale
Then, a dot product between C and U_rot gives the perpendicular distance from the defect point to the hull edge, and abs() is used to get a positive magnitude in any orientation.
distance = abs(U_rot.C) = abs(-dy0 * dx + dx0 * dy) * scale
In the scenario depicted on the above image, in first iteration, the edge is defined by H[0] and H[1]. The contour points tho examine for this edge are C[0], C[1], and C[2] (since C[3] == H[1]).
There are defects at C[1] and C[2]. The defect at C[1] is the deepest, so the algorithm will record (0, 1, 1, 50).
The next edge is defined by H[1] and H[2], and corresponding contour point C[3]. No defect is present, so nothing is recorded.
The next edge is defined by H[2] and H[3], and corresponding contour point C[4]. No defect is present, so nothing is recorded.
Since C[5] == H[3], the last contour point can be ignored -- there can't be a defect there.
I have a relative camera pose estimation problem where I am looking at a scene with differently oriented cameras spaced a certain distance apart. Initially, I am computing the essential matrix using the 5 point algorithm and decomposing it to get the R and t of camera 2 w.r.t camera 1.
I thought it would be a good idea to do a check by triangulating the two sets of image points into 3D, and then running solvePnP on the 3D-2D correspondences, but the result I get from solvePnP is way off. I am trying to do this to "refine" my pose as the scale can change from one frame to another. Anyway, In one case, I had a 45 degree rotation between camera 1 and camera 2 along the Z axis, and the epipolar geometry part gave me this answer:
Relative camera rotation is [1.46774, 4.28483, 40.4676]
Translation vector is [-0.778165583410928; -0.6242059242696293; -0.06946429947410336]
solvePnP, on the other hand..
Camera1: rvecs [0.3830144497209735; -0.5153903947692436; -0.001401186630803216]
tvecs [-1777.451836911453; -1097.111339375749; 3807.545406775675]
Euler1 [24.0615, -28.7139, -6.32776]
Camera2: rvecs [1407374883553280; 1337006420426752; 774194163884064.1] (!!)
tvecs[1.249151852575814; -4.060149502748567; -0.06899980661249146]
Euler2 [-122.805, -69.3934, 45.7056]
Something is troublingly off with the rvecs of camera2 and tvec of camera 1. My code involving the point triangulation and solvePnP looks like this:
points1.convertTo(points1, CV_32F);
points2.convertTo(points2, CV_32F);
// Homogenize image points
points1.col(0) = (points1.col(0) - pp.x) / focal;
points2.col(0) = (points2.col(0) - pp.x) / focal;
points1.col(1) = (points1.col(1) - pp.y) / focal;
points2.col(1) = (points2.col(1) - pp.y) / focal;
points1 = points1.t(); points2 = points2.t();
cv::triangulatePoints(P1, P2, points1, points2, points3DH);
cv::Mat points3D;
convertPointsFromHomogeneous(Mat(points3DH.t()).reshape(4, 1), points3D);
cv::solvePnP(points3D, points1.t(), K, noArray(), rvec1, tvec1, 1, CV_ITERATIVE );
cv::solvePnP(points3D, points2.t(), K, noArray(), rvec2, tvec2, 1, CV_ITERATIVE );
And then I am converting the rvecs through Rodrigues to get the Euler angles: but since rvecs and tvecs themselves seem to be wrong, I feel something's wrong with my process. Any pointers would be helpful. Thanks!
I have two sets of data associated with a video sequence. One contains relative rotation and translation data generated using one algorithm. The other is comprised of ground-truth extrinsic matrices associated with each frame.
I would like to compare the data-sets to determine the disparity between them. My question is, how can I derive the relative translation and rotation from the two extrinsic camera matrices?
If you have camera1 pose P1 = [R1|T1] and camera2 pose P2 = [R2|T2] then P1to2 = P2 * P1^-1.
Intuitively, imagine a simple case in which both cameras have translation zero, camera1 has rotation on X axis +30 degrees and camera2 rotation on X axis of +60 degrees.
P1 = [R1|0] P2 = [R2|0]
So they both differ of a +30 degrees rotation on the X axis:
P1to2 = P2 * P1^-1 = [R2|0] * [R1|0]^-1 = [R2|0] * [R1^1|0]
R2 * R1^1 = 60 - 30 on X = rotation of +30 degrees on X
The extrinsic matrix is comprised of the rotation matrix 3x3 and the translation vector 3x1.
So, M 3x4, is simply a concatenation of the two [R t]. Unless I misunderstand your question, getting the rotation and translation is trivial.
I am looking at the code for Hough transformation in image segmentation. The following code is from Computer Vision by Linda Shapiro. Can somebody tell me what is quantize_angle and how can I compute it?
The Hough transform looks for straight lines (or other features) in an image and represents these features as points in a different 2D coordinate system, where one axis represents the angle θ of a detected line, and the other represents the distance δ from this line to the centre of the image.
Source: Wikipedia
To produce a Hough transform of finite dimensions, both θ and δ have to be quantized. For example, if θ lies in the range (0 ≤ θ < 2π), then you could map it to the range 0–255 by a function such as the following:
int quantize_angle(float theta) {
int q = floor(theta * 128.0 / 3.141592654 + 0.5);
return q % 256;
}
This will result in a Hough transform that is 256 pixels wide.
-- Update 2 --
The following article is really useful (although it is using Python instead of C++) if you are using a single camera to calculate the distance: Find distance from camera to object/marker using Python and OpenCV
Best link is Stereo Webcam Depth Detection. The implementation of this open source project is really clear.
Below is the original question.
For my project I am using two camera's (stereo vision) to track objects and to calculate the distance. I calibrated them with the sample code of OpenCV and generated a disparity map.
I already implemented a method to track objects based on color (this generates a threshold image).
My question: How can I calculate the distance to the tracked colored objects using the disparity map/ matrix?
Below you can find a code snippet that gets the x,y and z coordinates of each pixel. The question: Is Point.z in cm, pixels, mm?
Can I get the distance to the tracked object with this code?
Thank you in advance!
cvReprojectImageTo3D(disparity, Image3D, _Q);
vector<CvPoint3D32f> PointArray;
CvPoint3D32f Point;
for (int y = 0; y < Image3D->rows; y++) {
float *data = (float *)(Image3D->data.ptr + y * Image3D->step);
for (int x = 0; x < Image3D->cols * 3; x = x + 3)
{
Point.x = data[x];
Point.y = data[x+1];
Point.z = data[x+2];
PointArray.push_back(Point);
//Depth > 10
if(Point.z > 10)
{
printf("%f %f %f", Point.x, Point.y, Point.z);
}
}
}
cvReleaseMat(&Image3D);
--Update 1--
For example I generated this thresholded image (of the left camera). I almost have the same of the right camera.
Besides the above threshold image, the application generates a disparity map. How can I get the Z-coordinates of the pixels of the hand in the disparity map?
I actually want to get all the Z-coordinates of the pixels of the hand to calculate the average Z-value (distance) (using the disparity map).
See this links: OpenCV: How-to calculate distance between camera and object using image?, Finding distance from camera to object of known size, http://answers.opencv.org/question/5188/measure-distance-from-detected-object-using-opencv/
If it won't solve you problem, write more details - why it isn't working, etc.
The math for converting disparity (in pixels or image width percentage) to actual distance is pretty well documented (and not very difficult) but I'll document it here as well.
Below is an example given a disparity image (in pixels) and an input image width of 2K (2048 pixels across) image:
Convergence Distance is determined by the rotation between camera lenses. In this example it will be 5 meters. Convergence distance of 5 (meters) means that the disparity of objects 5 meters away is 0.
CD = 5 (meters)
Inverse of convergence distance is: 1 / CD
IZ = 1/5 = 0.2M
Size of camera's sensor in meters
SS = 0.035 (meters) //35mm camera sensor
The width of a pixel on the sensor in meters
PW = SS/image resolution = 0.035 / 2048(image width) = 0.00001708984
The focal length of your cameras in meters
FL = 0.07 //70mm lens
InterAxial distance: The distance from the center of left lens to the center of right lens
IA = 0.0025 //2.5mm
The combination of the physical parameters of your camera rig
A = FL * IA / PW
Camera Adjusted disparity: (For left view only, right view would use positive [disparity value])
AD = 2 * (-[disparity value] / A)
From here you can compute actual distance using the following equation:
realDistance = 1 / (IZ – AD)
This equation only works for "toe-in" camera systems, parallel camera rigs will use a slightly different equation to avoid infinity values, but I'll leave it at this for now. If you need the parallel stuff just let me know.
if len(puntos) == 2:
x1, y1, w1, h1 = puntos[0]
x2, y2, w2, h2 = puntos[1]
if x1 < x2:
distancia_pixeles = abs(x2 - (x1+w1))
distancia_cm = (distancia_pixeles*29.7)/720
cv2.putText(imagen_A4, "{:.2f} cm".format(distancia_cm), (x1+w1+distancia_pixeles//2, y1-30), 2, 0.8, (0,0,255), 1,
cv2.LINE_AA)
cv2.line(imagen_A4,(x1+w1,y1-20),(x2, y1-20),(0, 0, 255),2)
cv2.line(imagen_A4,(x1+w1,y1-30),(x1+w1, y1-10),(0, 0, 255),2)
cv2.line(imagen_A4,(x2,y1-30),(x2, y1-10),(0, 0, 255),2)
else:
distancia_pixeles = abs(x1 - (x2+w2))
distancia_cm = (distancia_pixeles*29.7)/720
cv2.putText(imagen_A4, "{:.2f} cm".format(distancia_cm), (x2+w2+distancia_pixeles//2, y2-30), 2, 0.8, (0,0,255), 1,
cv2.LINE_AA)
cv2.line(imagen_A4,(x2+w2,y2-20),(x1, y2-20),(0, 0, 255),2)
cv2.line(imagen_A4,(x2+w2,y2-30),(x2+w2, y2-10),(0, 0, 255),2)
cv2.line(imagen_A4,(x1,y2-30),(x1, y2-10),(0, 0, 255),2)
cv2.imshow('imagen_A4',imagen_A4)
cv2.imshow('frame',frame)
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
I think this is a good way to measure the distance between two objects