Calculate the relative rotation between two different camera poses, using cv::FindHomography? - opencv

I have two calibrated cameras, with very different lenses and FOVs.
I have a chessboard visible in both.
What i need is to calculate very accurately (within 0.1 of a degree, ideally), the relative rotation between the cameras.
Using the following:
bool found_chessboard = cv::findChessboardCorners(imageGray, cv::Size(grid_size_x, grid_size_y), corners1, 0);// cv::CALIB_CB_ADAPTIVE_THRESH + cv::CALIB_CB_NORMALIZE_IMAGE + cv::CALIB_CB_FAST_CHECK);
bool found_chessboard2 = cv::findChessboardCorners(imageGray2, cv::Size(grid_size_x, grid_size_y), corners2, 0);// cv::CALIB_CB_ADAPTIVE_THRESH + cv::CALIB_CB_NORMALIZE_IMAGE + cv::CALIB_CB_FAST_CHECK);
if (found_chessboard && found_chessboard2)
{
cv::cornerSubPix(imageGray, corners1, cv::Size(3, 3), cv::Size(-1, -1), cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.01));
cv::cornerSubPix(imageGray2, corners2, cv::Size(3, 3), cv::Size(-1, -1), cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.01));
// Calculate the homography
cv::Mat h = cv::findHomography(corners1, corners2);
// Warp source image to destination
warpPerspective(img_src, img_dst, h, cv::Size(img_dst.cols, img_dst.rows));
}
I can calculate the homography matrix, and remap one image to another. This looks good.
Using the decompose with the intrinsic matrix:
std::vector<cv::Mat> Rs, Ts;
cv::decomposeHomographyMat(h,
K2,
Rs, Ts,
cv::noArray());
Prints 4 possible solutions, but they all look wrong.
As it is being run on raw images, I imagine that lens distortion would effect the results, is this correct?
I have also tried stereo calibration to solve this problem, but the results are variable, and I can only seem to get to within 2 degrees or so of the ground truth.
How can I accurately calculate the rotational difference between two very different cameras?

Related

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).
rotation_x = np.asarray([[1,0,0,0],
[0,np.cos(R_x),-np.sin(R_x),0],
[0,np.sin(R_x),np.cos(R_x),0],
[0,0,0,1]],np.float32)
translation = np.asarray([[1, 0, 0, 0],
[0, 1, 0, 0 ],
[0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
[0, 0, 0, 1]],np.float32)
intrinsic = np.asarray([[s_x * f / (dp_x ),0, 0, 0],
[0, 1 * f / (dp_y ) ,0, 0 ],
[0,0,1,0]],np.float32)
#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 1]], np.float32)
homography_matrix = intrinsic # translation # rotation # projection
inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC | cv2.WARP_INVERSE_MAP)
My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.
First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).
Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.
Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:
s * p = K * [R, t] * [Pw.T, 1].T
where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:
s * p = K * [r1, r2, t] * [X, Y, 1].T
But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.
If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.
Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

How do you calculate the average gradient direction and average gradient strength/magnitude

In OpenCV how do you calculate the average gradient strength in a Mat and the average gradient direction?
I have sourced the below methods by googling but I want to confirm I am actually doing this correctly before moving onto the next step.
Is this correct?
Mat img = imread('foo.png', CV_8UC); // read image as grayscale single channel
// Calculate the mean intensity and the std deviation
// Any errors here or am I doing this correctly?
Scalar sMean, sStdDev;
meanStdDev(src, sMean, sStdDev);
double mean = sMean[0];
double stddev = sStdDev[0];
// Calculate the average gradient magnitude/strength across the image
// Any errors here or am I doing this correctly?
Mat dX, dY, magnitude;
Sobel(src, dX, CV_32F, 1, 0, 1);
Sobel(src, dY, CV_32F, 0, 1, 1);
magnitude(dX, dY, magnitude);
Scalar sMMean, sMStdDev;
meanStdDev(magnitude, sMMean, sMStdDev);
double magnitudeMean = sMMean[0];
double magnitudeStdDev = sMStdDev[0];
// Calculate the average gradient direction across the image
// Any errors here or am I doing this correctly?
Scalar avgHorizDir = mean(dX);
Scalar avgVertDir = mean(dY);
double avgDir = atan2(-avgVertDir[0], avgHorizDir[0]);
float blurriness = cv::videostab::calcBlurriness(src); // low values = sharper. High values = blurry
Technically those are the correct ways of obtaining the two averages.
The way you compute mean direction uses weighted directional statistics, meaning that pixels without a strong gradient have less influence on the average.
However, for most images this average direction is not very meaningful, as there exist edges in all directions and cancel out.
If your image is of a single edge, then this will work great.
If your image has lines in it, containing edges in opposite directions, this will not work. In this case, you want to average the double angle (average orientations). The obvious way of doing this is to compute the direction per pixel as an angle, double them, then use directional statistics to average (ie convert back to vectors and average those). Doubling the angle causes opposite directions to be mapped to the same value, thus averaging doesn’t cancel these out.
Another simple way to average orientations is to take the average of the tensor field obtained by the outer product of the gradient field with itself, and determine the direction of the eigenvector corresponding to the largest eigenvalue. The tensor field is obtained as follows:
Mat Sxx = dX * dX;
Mat Syy = dY * dY;
Mat Sxy = dX * dY;
This should then be averaged:
Scalar mSxx = mean(sXX);
Scalar mSyy = mean(sYY);
Scalar mSxy = mean(sXY);
These values form a 2x2 real-valued symmetric matrix:
| mSxx mSxy |
| mSxy mSyy |
It is relatively straight-forward to determine its eigendecomposition, and can be done analytically. I don’t have the equations on hand right now, so I’ll leave it as an exercise to the reader. :)

Field of view of a GoPro camera

I have calibrated my GoPro Hero 4 Black using Camera calibration toolbox for Matlab and calculated its fields of view and focal length using OpenCV's calibrationMatrixValues(). These, however, differ from GoPro's specifications. Istead of 118.2/69.5 FOVs I get 95.4/63.4 and focal length 2.8mm instead of 17.2mm. Obviously something is wrong.
I suppose the calibration itself is correct since image undistortion seems to be working well.
Can anyone please give me a hint where I made a mistake? I am posting my code below.
Thanks.
Code
cameraMatrix = new Mat(3, 3, 6);
for (int i = 0; i < cameraMatrix.height(); i ++)
for (int j = 0; j < cameraMatrix.width(); j ++) {
cameraMatrix.put(i, j, 0);
}
cameraMatrix.put(0, 0, 582.18394);
cameraMatrix.put(0, 2, 663.50655);
cameraMatrix.put(1, 1, 582.52915);
cameraMatrix.put(1, 2, 378.74541);
cameraMatrix.put(2, 2, 1.);
org.opencv.core.Size size = new org.opencv.core.Size(1280, 720);
//output parameters
double [] fovx = new double[1];
double [] fovy = new double[1];
double [] focLen = new double[1];
double [] aspectRatio = new double[1];
Point ppov = new Point(0, 0);
org.opencv.calib3d.Calib3d.calibrationMatrixValues(cameraMatrix, size,
6.17, 4.55, fovx, fovy, focLen, ppov, aspectRatio);
System.out.println("FoVx: " + fovx[0]);
System.out.println("FoVy: " + fovy[0]);
System.out.println("Focal length: " + focLen[0]);
System.out.println("Principal point of view; x: " + ppov.x + ", y: " + ppov.y);
System.out.println("Aspect ratio: " + aspectRatio[0]);
Results
FoVx: 95.41677635378488
FoVy: 63.43170132212425
Focal length: 2.8063085232812504
Principal point of view; x: 3.198308916796875, y: 2.3934605770833333
Aspect ratio: 1.0005929569269807
GoPro specifications
https://gopro.com/help/articles/Question_Answer/HERO4-Field-of-View-FOV-Information
Edit
Matlab calibration results
Focal Length: fc = [ 582.18394 582.52915 ] ± [ 0.77471 0.78080 ]
Principal point: cc = [ 663.50655 378.74541 ] ± [ 1.40781 1.13965 ]
Skew: alpha_c = [ -0.00028 ] ± [ 0.00056 ] => angle of pixel axes = 90.01599 ± 0.03208 degrees
Distortion: kc = [ -0.25722 0.09022 -0.00060 0.00009 -0.01662 ] ± [ 0.00228 0.00276 0.00020 0.00018 0.00098 ]
Pixel error: err = [ 0.30001 0.28188 ]
One of the images used for calibration
And the undistorted image
You have entered 6.17mm and 4.55mm for the sensor size in OpenCV, which corresponds to an aspect ratio 1.36 whereas as your resolution (1270x720) is 1.76 (approximately 16x9 format).
Did you crop your image before MATLAB calibration?
The pixel size seems to be 1.55µm from this Gopro page (this is by the way astonishingly small!). If pixels are squared, and they should be on this type of commercial cameras, that means your inputs are not coherent. Computed sensor size should be :
[Sensor width, Sensor height] = [1280, 720]*1.55*10^-3 = [1.97, 1.12]
mm
Even if considering the maximal video resolution which is 3840 x 2160, we obtain [5.95, 3.35]mm, still different from your input.
Please see this explanation about equivalent focal length to understand why the actual focal length of the camera is not 17.2 but 17.2*5.95/36 ~ 2.8mm. In that case, compute FOV using the formulas here for instance. You will indeed find values of 93.5°/61.7° (close to your outputs but still not what is written in the specifications because there probably some optical distortion due to the wide angles).
What I do not understand though, is how the focal distance returned can be right whereas sensor size entered is wrong. Could you give more info and/or send an image?
Edits after question updates
On that cameras, with a working resolution of 1280x720, the image is downsampled but not cropped so what I said above about sensor dimensions does not apply. The sensor size to consider is indeed the one used (6.17x4.55) as explained in your first comment.
The FOV is constrained by the calibration matrix inputs (fx, fy, cx, cy) given in pixels and the resolution. You can check it by typing:
2*DEGRES(ATAN(1280/(2*582.18394))) (=95.416776...°)
This FOV value is smaller than what is expected, but by the look of the undistorted image, your MATLAB distortion model is right and the calibration is correct. The barrel distortion due to the wide angle seems well corrected by the the rewarp you applied.
However, MATLAB toolbox uses a pinhole model, which is linear and cannot account for intrinsic parameters such as lens distortion. I assume this from the page :
https://fr.mathworks.com/help/vision/ug/camera-calibration.html
Hence, my best guess is that unless you find a model which fits more accurately the Gopro camera (maybe a wide-angle lens model), MATLAB calibration will return an intrinsic camera matrix corresponding to the "linear" undistorted image and the FOV will indeed be smaller (in the case of barrel distortion). You will have to apply distortion coefficients associated to the calibration to retrieve the actual FOV value.
We can see in the corrected image that side parts of the FOV get rejected out of bounds. If you had warped the image entirely, you would find that some undistorted pixels coordinates exceed [-1280/2;+1280/2] (horizontally, and idem vertically). Then, replacing opencv.core.Size(1280, 720) by the most extreme ranges obtained, you would hopefully retrieve Gopro website values.
In conclusion, I think you can rely on the focal distance value that you obtained if you make measurements in the center of your image, otherwise there is too much distortion and it doesn't apply.

Pose estimation: solvePnP and epipolar geometry do not agree

I have a relative camera pose estimation problem where I am looking at a scene with differently oriented cameras spaced a certain distance apart. Initially, I am computing the essential matrix using the 5 point algorithm and decomposing it to get the R and t of camera 2 w.r.t camera 1.
I thought it would be a good idea to do a check by triangulating the two sets of image points into 3D, and then running solvePnP on the 3D-2D correspondences, but the result I get from solvePnP is way off. I am trying to do this to "refine" my pose as the scale can change from one frame to another. Anyway, In one case, I had a 45 degree rotation between camera 1 and camera 2 along the Z axis, and the epipolar geometry part gave me this answer:
Relative camera rotation is [1.46774, 4.28483, 40.4676]
Translation vector is [-0.778165583410928; -0.6242059242696293; -0.06946429947410336]
solvePnP, on the other hand..
Camera1: rvecs [0.3830144497209735; -0.5153903947692436; -0.001401186630803216]
tvecs [-1777.451836911453; -1097.111339375749; 3807.545406775675]
Euler1 [24.0615, -28.7139, -6.32776]
Camera2: rvecs [1407374883553280; 1337006420426752; 774194163884064.1] (!!)
tvecs[1.249151852575814; -4.060149502748567; -0.06899980661249146]
Euler2 [-122.805, -69.3934, 45.7056]
Something is troublingly off with the rvecs of camera2 and tvec of camera 1. My code involving the point triangulation and solvePnP looks like this:
points1.convertTo(points1, CV_32F);
points2.convertTo(points2, CV_32F);
// Homogenize image points
points1.col(0) = (points1.col(0) - pp.x) / focal;
points2.col(0) = (points2.col(0) - pp.x) / focal;
points1.col(1) = (points1.col(1) - pp.y) / focal;
points2.col(1) = (points2.col(1) - pp.y) / focal;
points1 = points1.t(); points2 = points2.t();
cv::triangulatePoints(P1, P2, points1, points2, points3DH);
cv::Mat points3D;
convertPointsFromHomogeneous(Mat(points3DH.t()).reshape(4, 1), points3D);
cv::solvePnP(points3D, points1.t(), K, noArray(), rvec1, tvec1, 1, CV_ITERATIVE );
cv::solvePnP(points3D, points2.t(), K, noArray(), rvec2, tvec2, 1, CV_ITERATIVE );
And then I am converting the rvecs through Rodrigues to get the Euler angles: but since rvecs and tvecs themselves seem to be wrong, I feel something's wrong with my process. Any pointers would be helpful. Thanks!

Calculate distance (disparity) OpenCV

-- Update 2 --
The following article is really useful (although it is using Python instead of C++) if you are using a single camera to calculate the distance: Find distance from camera to object/marker using Python and OpenCV
Best link is Stereo Webcam Depth Detection. The implementation of this open source project is really clear.
Below is the original question.
For my project I am using two camera's (stereo vision) to track objects and to calculate the distance. I calibrated them with the sample code of OpenCV and generated a disparity map.
I already implemented a method to track objects based on color (this generates a threshold image).
My question: How can I calculate the distance to the tracked colored objects using the disparity map/ matrix?
Below you can find a code snippet that gets the x,y and z coordinates of each pixel. The question: Is Point.z in cm, pixels, mm?
Can I get the distance to the tracked object with this code?
Thank you in advance!
cvReprojectImageTo3D(disparity, Image3D, _Q);
vector<CvPoint3D32f> PointArray;
CvPoint3D32f Point;
for (int y = 0; y < Image3D->rows; y++) {
float *data = (float *)(Image3D->data.ptr + y * Image3D->step);
for (int x = 0; x < Image3D->cols * 3; x = x + 3)
{
Point.x = data[x];
Point.y = data[x+1];
Point.z = data[x+2];
PointArray.push_back(Point);
//Depth > 10
if(Point.z > 10)
{
printf("%f %f %f", Point.x, Point.y, Point.z);
}
}
}
cvReleaseMat(&Image3D);
--Update 1--
For example I generated this thresholded image (of the left camera). I almost have the same of the right camera.
Besides the above threshold image, the application generates a disparity map. How can I get the Z-coordinates of the pixels of the hand in the disparity map?
I actually want to get all the Z-coordinates of the pixels of the hand to calculate the average Z-value (distance) (using the disparity map).
See this links: OpenCV: How-to calculate distance between camera and object using image?, Finding distance from camera to object of known size, http://answers.opencv.org/question/5188/measure-distance-from-detected-object-using-opencv/
If it won't solve you problem, write more details - why it isn't working, etc.
The math for converting disparity (in pixels or image width percentage) to actual distance is pretty well documented (and not very difficult) but I'll document it here as well.
Below is an example given a disparity image (in pixels) and an input image width of 2K (2048 pixels across) image:
Convergence Distance is determined by the rotation between camera lenses. In this example it will be 5 meters. Convergence distance of 5 (meters) means that the disparity of objects 5 meters away is 0.
CD = 5 (meters)
Inverse of convergence distance is: 1 / CD
IZ = 1/5 = 0.2M
Size of camera's sensor in meters
SS = 0.035 (meters) //35mm camera sensor
The width of a pixel on the sensor in meters
PW = SS/image resolution = 0.035 / 2048(image width) = 0.00001708984
The focal length of your cameras in meters
FL = 0.07 //70mm lens
InterAxial distance: The distance from the center of left lens to the center of right lens
IA = 0.0025 //2.5mm
The combination of the physical parameters of your camera rig
A = FL * IA / PW
Camera Adjusted disparity: (For left view only, right view would use positive [disparity value])
AD = 2 * (-[disparity value] / A)
From here you can compute actual distance using the following equation:
realDistance = 1 / (IZ – AD)
This equation only works for "toe-in" camera systems, parallel camera rigs will use a slightly different equation to avoid infinity values, but I'll leave it at this for now. If you need the parallel stuff just let me know.
if len(puntos) == 2:
x1, y1, w1, h1 = puntos[0]
x2, y2, w2, h2 = puntos[1]
if x1 < x2:
distancia_pixeles = abs(x2 - (x1+w1))
distancia_cm = (distancia_pixeles*29.7)/720
cv2.putText(imagen_A4, "{:.2f} cm".format(distancia_cm), (x1+w1+distancia_pixeles//2, y1-30), 2, 0.8, (0,0,255), 1,
cv2.LINE_AA)
cv2.line(imagen_A4,(x1+w1,y1-20),(x2, y1-20),(0, 0, 255),2)
cv2.line(imagen_A4,(x1+w1,y1-30),(x1+w1, y1-10),(0, 0, 255),2)
cv2.line(imagen_A4,(x2,y1-30),(x2, y1-10),(0, 0, 255),2)
else:
distancia_pixeles = abs(x1 - (x2+w2))
distancia_cm = (distancia_pixeles*29.7)/720
cv2.putText(imagen_A4, "{:.2f} cm".format(distancia_cm), (x2+w2+distancia_pixeles//2, y2-30), 2, 0.8, (0,0,255), 1,
cv2.LINE_AA)
cv2.line(imagen_A4,(x2+w2,y2-20),(x1, y2-20),(0, 0, 255),2)
cv2.line(imagen_A4,(x2+w2,y2-30),(x2+w2, y2-10),(0, 0, 255),2)
cv2.line(imagen_A4,(x1,y2-30),(x1, y2-10),(0, 0, 255),2)
cv2.imshow('imagen_A4',imagen_A4)
cv2.imshow('frame',frame)
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
I think this is a good way to measure the distance between two objects

Resources