Applying OpenCV Pose Estimation to Blender Camera - opencv

The Problem
I'm trying to use Blender to create synthetic image for use with OpenCV's pose estimation (specifically, OpenCV's findEssentialMat and recoverPose). However, I found that the rotation matrix R that OpenCV returns corrects for rotations along the camera's Y and Z axes, but not its X axis. I suspect this is because Blender and OpenCV have different camera models (see diagram), but I can't figure out how to correct for this. How would I take a rotation matrix made using OpenCV's camera model and apply it to Blender's camera model?
Additional Details
To test this, I rendered a scene from (0, 0, 30) using an identity camera rotation, then rotated the camera by 10 degrees along X, Y and Z. First, here's the identity rotation matrix (no rotation):
Here's a 10-degree rotation around X:
And here's a 10-degree rotation around Y:
Finally, here's a 10-degree rotation around Z:
Applying the rotation matrix OpenCV returns (for the estimated transformation between the rotated image and the original) corrects for all of these rotations except around X, which looks like this:
It seems that instead of correctly rotating by -10 degrees around X, the rotation matrix rotates a further 10 degrees around X.

Going from an OpenCV matrix to Blender could be accomplished by multiplying by another rotation matrix to compensate for the change in coordinates systems.
mat = [[1, 0, 0],
[0, -1, 0],
[0, 0, -1]]
You said the Y and Z components are already being compensated for so perhaps the negative of the matrix is what you need?

Related

After Pose Estimation 3D Coordinate Axes Misplaced

As a newbie I'm trying calculate pose of an planar object using OpenCV's solvePnP. Howeve, I see a weird output. The axes I draw always draws axes on the corner of my frame. To draw my axis I use:
drawFrameAxes(frame_copy, cameraMatrix, distanceCoeffisions, rvec, tvec, length);
The output I get is as follows:
P.s. (X:red, Y: green, Z: blue)
output of my code
output of my code_highlight
I don't have any depth information
I am not sure if this is true but to obtain 3D points I use inliers and define the z coordinate as 0.
Points.push_back(Point3f(inliers[i].pt.x, inliers[i].pt.y, 0));
So what could be the problem, any resource pointers or suggestions is my guest.
Solved the problem
Solution :
Fixed camera calibration and problem solved.
Thanks!

Correlation between OpenCV Camera calibrate matrix to OpenGL Projection matrix

I'm having quite a hard time to render a .stl model with the same dimensions and proportions visualized in the picture of the real object.
I put the real object at the centre of the view area and I would like to flip between the model and the real object (the real object is 9 times smaller than one in the model).
First, I calibrate the camera using OpenGL like in the tutorial, when the focus is optimal to the real object.
After I get the cx, cy, fx and fy values, I create a perspective matrix with these values and I use the glMultMatrixd function.
Finally, I resize the photo to be the same as my OpenGL window and compare the photo with the rendered model.
But I have some problems with the result:
there is proportion distortion (wider than tall)
there is a perspective distortion (The camera is perpendicular in relation to the object, so I should view only the top of the object, but it is showing the lateral of the object )
the size is not compatible with the real object (a little smaller)
My relation matrix is below:
GLdouble perspMatrix[16] = { fx / cx, 0 , 0 , 0,
0, fy / cy , 0 , 0,
0, 0 , -(znear + zfar) / (znear - zfar), 2 * zfar*znear / (zfar - znear),
0, 0 , -1 , 0 };
OpenGL expects matrices in column-major memory order. The data structure you show above is in row-major memory order (the normal writing order, which OpenCV also uses as memory order) so you need to transpose it before passing to OpenGL.
You're also constructing a symmetrical viewing frustum (right==left==cx and top==bottom==cy), which assumes the optical axis (cx,cy) is at the centre of the camera's imager. Real cameras are never so perfect, which is why we have to calibrate them. You need an asymmetric frustum, using left==cx, right==(width-cx), etc; this guide should help.
Note that in OpenGL, +Y is up, and +Z is behind the camera (in OpenCV +Y is down, +Z is in front of the camera), so check the signs on your matrix.

Determining perspective distortion from euler angles

I have the readings from a gyroscope attached to a camera describing the orientation of the camera in 3D (say with 3 Euler angles).
I take a picture (of say a flat plane) from this pose. After which, I want to transform the image to another image, as though it has been taken with the camera being perpendicular to the plane itself.
How would I do something like this in OpenCV? Can someone point me in the correct direction?
You can checkout how to calculate the rotation matrix using the roll-pitch-yaw angles here: http://planning.cs.uiuc.edu/node102.html
A Transformation matrix is T = [R t; 0 1] (in matlab notation)
Here, you can place the translation as a 3x1 vector in 't' and the calculated rotation matrix in 'R'.
Since a mathematical information is missing, I assume the Z-axis of the image and the camera are parallel. In this case, you have to add a 90° rotation to either the X or the Y axis to get a perpendicular view. This is to take care of orientation.
perspectiveTransform() function should be helpful thereon.
Check out this question for code insights: How to calculate perspective transform for OpenCV from rotation angles?

Undistortion without camera matrix

Is there a way to undistort and map a view to another without knowing the camera matrix ?
Goal is to transform this kind of view:
to a 2D rectangle:
using minimum number of distorted points.
The distorted view is not generated by a camera and looks like having a barrel distortion on the y axis. On the X axis, things look a little easier since it looks like a direct polar transformation.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Resources