extrinsic matrix computation with opencv

extrinsic matrix computation with opencv - opencv

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)
Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.
However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?

Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):
The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).
In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:
|Xc| |X|
|Yc|= R |Y| + T
|Zc| |Z|
That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).
Now, the answer. To obtain R and T, you can do the following:
Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.
Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.
Take a picture and get the corresponding 2D image coordinates.
Use solvePnP to obtain the rotation and translation, with the following parameters:
objectPoints: the 3D points in the world reference frame.
imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
cameraMatris: the intrinsic matrix you already have.
distCoeffs: the distortion coefficients you already have.
rvec, tvec: these will be the outputs.
useExtrinsicGuess: false
flags: you can use CV_ITERATIVE
Finally, get R from rvec with the Rodrigues function.
You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).
And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:
Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.
Recommended: redo the intrinsic calibration with the right scale, something like:
objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)
Where size_of_a_square is the real size of a square.
(Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.

Related

Find the Transformation Matrix that maps 3D local coordinates to global coordinates

I'm coding a calibration algorithm for my depth-camera. This camera outputs an one channel 2D image with the distance of every object in the image.
From that image, and using the camera and distortion matrices, I was able to create a 3D point cloud, from the camera perspective. Now I wish to convert those 3D coordinates to a global/world coordinates. But, since I can't use any patterns like the chessboard to calibrate the camera, I need another alternative.
So I was thinking: If I provide some ground points (in the camera perspective), I would define a plane that I know should have the Z coordinate close to zero, in the global perspective. So, how should I proceed to find the transformation matrix that horizontalizes the plane.
Local coordinates ground plane, with an object on top
I tried using the OpenCV's solvePnP, but it didn't gave me the correct transformation. Also I thought in using the OpenCV's estimateAffine3D, but I don't know where should the global coordinates be mapped to, since the provided ground points do not need to lay on any specific pattern/shape.
Thanks in advance

What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.
To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:
Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
Easier explanation: http://nghiaho.com/?page_id=671
Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_
So, if you want to transform 3D points from the camera reference frame, do the following:
As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.
Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:
q = R.dot(p)+t
EDIT: answer when 3D position of points in world are unspecified
Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.
Here is a possible solution:
Take the selected 3D points in camera reference frame, let's call them q'i.
Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
Use these values of pi and qi to estimate R and t, as in the first part of the answer!
Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.

You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

OpenCV - Projection, homography matrix and bird's eye view

I'd like to get homography matrix to Bird's eye view and I know the projection Matrix of the camera. Is there any relation between them?
Thanks.

A projection matrix is defined as a product of camera's intrinsic (e.g. focal length, principal points, etc.) and extrinsic (rotation and translation) matrices. The question is w.r.t what your rotation and translation are? For example, I can imagine another camera or an object in 3D with respect to which you have these rotations and translations. Otherwise your projection is just an intrinsic matrix.
Think first about the pieces of information you need to know to obtain a bird’s eye view: you need to know at least how your camera is oriented w.r.t ground surface. If you also know camera elevation you can create a metric reconstruction. But since you mentioned a homography, I assume that you consider a bird’s eye view of a flat surface since a homography maps the points on two flat surfaces, in your case the points on a flat ground to the points on your flat sensor.
Let’s consider a pinhole camera equation. It basically says that
[u, v, 1]T ~ A*[R|t][x, y, z, 1]T, where A is a camera intrinsic matrix. Now since you deal with a ground plane, you can align a new coordinate system with it by setting z=0; R|t are rotation and translation matrices from this coordinate system into your camera-aligned system;
Next, note that your R|t is a 3x4 matrix and it looses one dimension since z=0; it becomes 3x3 or Homography which is equal now to H=A*R’|t; Ok all we did was proving that a homography mapping existed between the ground and your sensor;
Now, you want another kind of homography that happens during pure camera rotations and zooms between points on sensor before and after rotations/zoom; that is you want to rotate the camera down and possibly zoom out. Again, think in terms of a pinhole camera equation: originally you had H1=A ( here I threw out R|T as irrelevant for now) and then you rotated your camera and you have H2=AR; in other words, H1 is how you make your image now and H2 is how you want your image look like.
The relations between two is what you want to find, H12, and it is also a homography since the Homography is a family of transformations (use this simple heuristics: what happens in a family stays in the family). Since the same surface can generate images either with H1 or H2 we can assemble H12 by undoing H1 (back to the ground plane) and applying H2 (from the ground to a sensor bird’s eye view); in a way this resembles operations with vectors, you just have to respect the order of matrix application from the right to the left:
H12 = H2*H1-1=A*R*A-1=P*A-1 , where we substituted the expressions for H1, H2 and finally for a projection matrix (in case you do have it)
This is your answer, and if the rotation R is unknown it can be guessed from the camera orientation w.r.t. the ground or calculated using solvePnP() from the opeCV library. Finally, when I do this on a cell phone I just use its accelerometer readings as a good approximation since when a cell phone is not accelerated the readings represent a gravity vector which gives the rotation w.r.t. flat horizontal ground.
When you plot your bird’s eye view as an image you will notice that its boundaries turned from rectangular into some kind of a trapezoid (due to a camera frustum shape) and there are some holes at the distant locations (due to the insufficient sampling rate). You can interpolate inside the holes using wrapPerspective()

Consistency of projecting points onto an undistorted image

I want to project a point in 3D space into 2D image coordinates. I have the calibrated intrinsics and extrinsics of the camera I'm using. I have the camera matrix K and distortion coefficients D. However, I want the projected image coordinates to be of the undistorted image.
From my research, I found two ways to do this.
Use opencv's getOptimalNewCameraMatrix function to compute a new undistorted image's camera matrix K'. Then use this K' in opencv's projectPoints function, with the distortion parameters set to 0, to get the projected point.
Use projectPoints function using the raw camera matrix K, along with the distortion coefficients D in this function and get the projected point.
Should the output of both methods match?

I think that there is something missing in your thought.
Camera matrix K and dist. coefficent D are the parameters for make the undistortion (if your lens is distorting the image like in a fisheye). They are what is called intrinsic camera parameters.
If we change terms from computer vision to computer graphics, those parameters are the one you use for defining the frustum of the view, and, for example, they are used for getting the focal length of the camera.
But they are not enough to do the projection stuff.
For the projection, if you think in a computer graphics term (like opengl, for instance) you need to have the model-view-projection matrix. The model matrix is the matrix that specifies the position of the object in the world. The view matrix specifies the position of the camera, and the projection matrix specify the frustum (focal angle, perspective distortion, etc).
If you want to know how to transform the points of the model from 3d to 2d (or viceversa) you need the projection and the view matrixes (you have the model matrix because you have the 3d points from which you want to start). And in computer vision the view matrix is called estrinsic parameters.
So, you need the estrinsic parameters too, that are the position of the camera in the world. That is, for instance, those parameters are the rvec and tvec that cv:: projectPoints needs.
If you want to compute them, they are exactly the output of cv::solvePnP that do the opposite of what you want to do: from some known 3d points coupled with the known 2d projection on them on the camera screen, this function gives you the estrinsic parameters (from which you can get the view matrix for some opengl-opencv-augmented-reality-whatever application via cv::Rodrigues).
Last note: while the instrinsic parameters are fixed in all the pictures you shoot with a camera (while you don't change the focal length of course), the estrinisc parameters changes every time you move the camera for take a new picture from a different view point (that is: this changes the perspective point of view, so the 3D-2D projection you want to find)
Hope could help!

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new

It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart