OpenCV Stereo Calibration (example code questions) - opencv

At the moment I am implementing the calibration method(s) for stereo vision. I am using the OpenCV library.
There is an example in the sample folder, but I have some questions about the implementation:
Where are these array's for and what are those CvMat variables?
// ARRAY AND VECTOR STORAGE:
double M1[3][3], M2[3][3], D1[5], D2[5];
double R[3][3], T[3], E[3][3], F[3][3];
CvMat _M1 = cvMat(3, 3, CV_64F, M1 );
CvMat _M2 = cvMat(3, 3, CV_64F, M2 );
CvMat _D1 = cvMat(1, 5, CV_64F, D1 );
CvMat _D2 = cvMat(1, 5, CV_64F, D2 );
CvMat _R = cvMat(3, 3, CV_64F, R );
CvMat _T = cvMat(3, 1, CV_64F, T );
CvMat _E = cvMat(3, 3, CV_64F, E );
CvMat _F = cvMat(3, 3, CV_64F, F );
In other examples I see this code:
//--------Find and Draw chessboard--------------------------------------------------
if((frame++ % 20) == 0)
{
//----------------CAM1-------------------------------------------------------------------------------------------------------
result1 = cvFindChessboardCorners( frame1, board_sz,&temp1[0], &count1,CV_CALIB_CB_ADAPTIVE_THRESH|CV_CALIB_CB_FILTER_QUADS);
cvCvtColor( frame1, gray_fr1, CV_BGR2GRAY );
What does the if statement exactly do? Why %20?
Thank you in advance!
Update:
I have a two questions about some implementation code: link
-1: Those nx and ny variables that are declared in line 18 and used in the board_sz variable at line 25. Are these nx and ny the rows and columns or the corners in the chessboard pattern? (I think that these are the rows and columns, because cvSize has parameters for width and height).
-2: What are these CvMat variables for (lines 143 - 146)?
CvMat _objectPoints = cvMat(1, N, CV_32FC3, &objectPoints[0] );
CvMat _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] );
CvMat _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] );
CvMat _npoints = cvMat(1, npoints.size(), CV_32S, &npoints[0] );

Each of those matrices has a meaning in epipolar geometry. They describe the relation between your two cameras in 3D space and between the images they record.
In your example, they are:
M1 - the camera intrinsics matrix of your left camera
M2 - the camera intrinsics matrix of your right camera
D1 - the distortion coefficients of your left camera
D2 - the distortion coefficients of your right camera
R - the rotation matrix from the right to your left camera
T - the translation vector from the right to your left camera
E - the essential matrix of your stereo setup
F - the fundamental matrix of your stereo setup
On the basis of these matrices, you can undistort and rectify your images, which allows you to extract the depth of a point you see in both images by way of their disparity (the difference in x, basically). Finding a point in both images is called matching, and is generally the last step after rectification.
Any good introduction to epipolar geometry and stereo vision will probably be better than anything I could type up here. I recommend the Learning OpenCV book from which your example code is taken and which goes into great detail explaining the basics.
The second part of your question has already been answered in a comment:
(frame++ % 20) is 0 for every 20th frame recorded from your webcam, so the code in the if-clause is executed once per 20 frames.
Response to your update:
nx and ny are the number of corners in the chessboard pattern in your calibration images. n a "normal" 8x8 chessboard, nx = ny = 7. You can see that in lines 138-139, the points of one ideal chessboard are created by offsetting nx*ny points with a distance of squareSize, the size of one square in your chessboard.
The CvMat variables "objectPoints", "imagePoints" and "npoints" are passed into the cvStereoCalibrate function.
objectPoints contains the points of your calibration object (the chessboard)
imagePoints1/2 contain these points as seen by each of your cameras
npoints just contains the number of points in each image (as an M-by-1 matrix) - feel free to ignore it, it's not used in the OpenCV C++ API any more anyway.
Basically, cvStereoCalibrate fits the imagePoints to the objectPoints, and returns 1) the distortion coefficients, 2) the intrinsic camera matrices and 3) the spatial relation of the two cameras as the rotation matrix R and translation vector T. The first are used to undistort your images, the second relay your pixel coordinates to real-world coordinates, and the third allow you can rectify your two images.
As a side note: I remember having trouble with the stereo calibration because the chessboard orientation could be detected differently in the left and right camera images. This shouldn't be a problem unless you have a large angle between your cameras (which isn't a great idea) or you incline your chessboards a lot (which isn't necessary), but you can still keep an eye out.

Related

Huge reprojection errors triangulating appropriate points on stereopair with additional GPS data

I'm solving VSLAM task using 2d-3d algo based on OpenCV library. Now I'm trying to make georeferencing using GPS data. I transform R, t of each camera and then triangulate matched points using trivial function
Triangulate(const cv::KeyPoint &kp1, const cv::KeyPoint &kp2, const cv::Mat &P1, const cv::Mat &P2, cv::Mat &x3D) {
cv::Mat A(4,4,CV_32F);
A.row(0) = kp1.pt.x*P1.row(2)-P1.row(0);
A.row(1) = kp1.pt.y*P1.row(2)-P1.row(1);
A.row(2) = kp2.pt.x*P2.row(2)-P2.row(0);
A.row(3) = kp2.pt.y*P2.row(2)-P2.row(1);
cv::Mat u,w,vt;
cv::SVD::compute(A,w,u,vt,cv::SVD::MODIFY_A| cv::SVD::FULL_UV);
x3D = vt.row(3).t();
x3D = x3D.rowRange(0,3)/x3D.at<float>(3); }
, where kp1 and kp2 - keypoint on left-right image, P1, P2 - projection matrices
I have faced strange problem: when I'm making simple shift for cameras centers by some huge constant, I've got big reprojection errors on old suitable triangulated points. Is SVD decomposition for points triangulation sensitive to cameras centers scale?
Just mistake in another part of code. Sorry.

iOS & OpenCV: Image Registration / Alignment

I am doing a project of combining multiple images similar to HDR in iOS. I have managed to get 3 images of different exposures through the Camera and now I want to align them because during the capture, one's hand must have shaken and resulted in all 3 images having slightly different alignment.
I have imported OpenCV framework and I have been exploring functions in OpenCV to align/register images, but found nothing. Is there actually a function in OpenCV to achieve this? If not, is there any other alternatives?
Thanks!
In OpenCV 3.0 you can use findTransformECC. I have copied this ECC Image Alignment code from LearnOpenCV.com where a very similar problem is solved for aligning color channels. The post also contains code in Python. Hope this helps.
// Read the images to be aligned
Mat im1 = imread("images/image1.jpg");
Mat im2 = imread("images/image2.jpg");
// Convert images to gray scale;
Mat im1_gray, im2_gray;
cvtColor(im1, im1_gray, CV_BGR2GRAY);
cvtColor(im2, im2_gray, CV_BGR2GRAY);
// Define the motion model
const int warp_mode = MOTION_EUCLIDEAN;
// Set a 2x3 or 3x3 warp matrix depending on the motion model.
Mat warp_matrix;
// Initialize the matrix to identity
if ( warp_mode == MOTION_HOMOGRAPHY )
warp_matrix = Mat::eye(3, 3, CV_32F);
else
warp_matrix = Mat::eye(2, 3, CV_32F);
// Specify the number of iterations.
int number_of_iterations = 5000;
// Specify the threshold of the increment
// in the correlation coefficient between two iterations
double termination_eps = 1e-10;
// Define termination criteria
TermCriteria criteria (TermCriteria::COUNT+TermCriteria::EPS, number_of_iterations, termination_eps);
// Run the ECC algorithm. The results are stored in warp_matrix.
findTransformECC(
im1_gray,
im2_gray,
warp_matrix,
warp_mode,
criteria
);
// Storage for warped image.
Mat im2_aligned;
if (warp_mode != MOTION_HOMOGRAPHY)
// Use warpAffine for Translation, Euclidean and Affine
warpAffine(im2, im2_aligned, warp_matrix, im1.size(), INTER_LINEAR + WARP_INVERSE_MAP);
else
// Use warpPerspective for Homography
warpPerspective (im2, im2_aligned, warp_matrix, im1.size(),INTER_LINEAR + WARP_INVERSE_MAP);
// Show final result
imshow("Image 1", im1);
imshow("Image 2", im2);
imshow("Image 2 Aligned", im2_aligned);
waitKey(0);
There is no single function called something like align, you need to do/implement it yourself, or find an already implemented one.
Here is a one solution.
You need to extract keypoints from all 3 images and try to match them. Be sure that your keypoint extraction technique is invariant to illumination changes since all have different intensity values because of different exposures. You need to match your keypoints and find some disparity. Then you can use disparity to align your images.
Remember this answer is so superficial, for details first you need to do some research about keypoint/descriptor extraction, and keypoint/descriptor matching.
Good luck!

How to determine world coordinates of a camera?

I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:
Read in previously computed camera intrinsics file
Get the pixel coordinates of the 4 points of the target rectangle from the contour
Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
Call projectPoints with the rotation and translation vectors
???
I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(
2013-04-02
Following the advice from "morynicz", I have written this simple standalone program.
#include <Windows.h>
#include "opencv\cv.h"
using namespace cv;
int main (int argc, char** argv)
{
const char *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage camera_data (calibration_filename, FileStorage::READ);
Mat camera_intrinsics, distortion;
vector<Point3d> world_coords;
vector<Point2d> pixel_coords;
Mat rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat cw_transform = cv::Mat::eye (4, 4, CV_64FC1);
// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();
// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));
// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));
// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));
// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;
printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}
The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?
Rotation vector
2.7005
0.0328
0.4590
Translation vector
-10.4774
8.1194
13.9423
Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009
Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).
What units is the pose in?
You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.
The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.
If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.
The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)
Looking at Your code, You need to do something like this
cv::Mat R;
cv::Rodrigues(rotation_vector, R);
cv::Mat cameraRotationVector;
cv::Rodrigues(R.t(),cameraRotationVector);
cv::Mat cameraTranslationVector = -R.t()*translation_vector;
cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.
It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z.
It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)
Edit:
I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.

OpenCV Camera calibration use of rotation matrix

http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#calibratecamera
I used cv::calibrateCamera method with 9*6 chessboard pattern.
Now I am getting rvecs and tvecs corresponding to each pattern,
Can somebody explain the format of rvecs and tvecs?
As far as I have figured out it is each one is 3*1 matrix.
and OpenCV documentation suggests to see Rodrigues function.
http://en.wikipedia.org/wiki/Rodrigues'_rotation_formula
As far rodrigues is concerned it is way to rotate a vector
around a given axis with angle theta.
but for this we need four values unit Vector(ux,uy,uz) and the angle. but openCV seem to use only 3 values.
OpenCV rodrigues documentation refer the below link http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#void Rodrigues(InputArray src, OutputArray dst, OutputArray jacobian)
says that it will convert 3*1 matrix to 3*3 rotation matrix.
Is this matrix same as which we use 3D graphics.
can I convert it to 4*4 matrix and use it for transformations like below
M4X4 [
x x x 0
x x x 0
x x x 0
0 0 0 1
]
x : are the values from output 3by3 matrix of rodrigues function.
Is the relationship valid:
Vout = M4X4 * Vin;
using the matrix above.
The 3x1 rotation vector can express a rotation matrix by defining an axis of rotation via the direction that the vector points and an angle via the magnitude of the vector. Using the opencv function Rodrigues(InputArray src, OutputArray dst) you can obtain a rotation matrix which fits the function you describe.

How to find Camera matrix for Augmented Reality?

I want to augment a virtual object at x,y,z meters wrt camera. OpenCV has camera calibration functions but I don't understand how exactly I can give coordinates in meters
I tried simulating a Camera in Unity but don't get expected result.
I set the projection matrix as follows and create a unit cube at z = 2.415 + 0.5 .
Where 2.415 is the distance between the eye and the projection plane (Pinhole camera model)
Since the cube's face is at the front clipping plane and it's dimension are unit shouldn't it cover the whole viewport?
Matrix4x4 m = new Matrix4x4();
m[0, 0] = 1;
m[0, 1] = 0;
m[0, 2] = 0;
m[0, 3] = 0;
m[1, 0] = 0;
m[1, 1] = 1;
m[1, 2] = 0;
m[1, 3] = 0;
m[2, 0] = 0;
m[2, 1] = 0;
m[2, 2] = -0.01f;
m[2, 3] = 0;
m[3, 0] = 0;
m[3, 1] = 0;
m[3, 2] = -2.415f;
m[3, 3] = 0;
The global scale of your calibration (i.e. the units of measure of 3D space coordinates) is determined by the geometry of the calibration object you use. For example, when you calibrate in OpenCV using images of a flat checkerboard, the inputs to the calibration procedure are corresponding pairs (P, p) of 3D points P and their images p, the (X, Y, Z) coordinates of the 3D points are expressed in mm, cm, inches, miles, whatever, as required by the size of target you use (and the optics that images it), and the 2D coordinates of the images are in pixels. The output of the calibration routine is the set of parameters (the components of the projection matrix P and the non-linear distortion parameters k) that "convert" 3D coordinates expressed in those metrical units into pixels.
If you don't know (or don't want to use) the actual dimensions of the calibration target, you can just fudge them but leave their ratios unchanged (so that, for example, a square remains a square even though the true length of its side may be unknown). In this case your calibration will be determined up to an unknown global scale. This is actually the common case: in most virtual reality applications you don't really care what the global scale is, as long as the results look correct in the image.
For example, if you want to add an even puffier pair of 3D lips on a video of Angelina Jolie, and composite them with the original video so that the brand new fake lips stay attached and look "natural" on her face, you just need to rescale the 3D model of the fake lips so that it overlaps correctly the image of the lips. Whether the model is 1 yard or one mile away from the CG camera in which you render the composite is completely irrelevant.
For finding augmenting an object you need to find camera pose and orientation. That is the same as finding the camera extrinsics. You also have to calculate first the camera intrinsics (which is called calibraiton).
OpenCV allows you to do all of this, but is not trivial, it requires work on your own. I give you a clue, you first need to recognize something in the scene that you know how it looks, so you can calculate the camera pose by analyzing this object, call it a marker. You can start by the tipical fiducials, they are easy to detect.
Have a look at this thread.
I ended up measuring the field of view manually. Once you know FOV you can easily create the projection matrix. No need to worry about units because in the end the projection is of the form ( X*d/Z, Y*d/Z ). Whatever the units of X,Y,Z may be the ratio of X/Z remains the same.

Resources