OpenCV: get perspective matrix from translation & rotation - opencv

I'm trying to verify my camera calibration, so I'd like to rectify the calibration images. I expect that this will involve using a call to warpPerspective but I do not see an obvious function that takes the camera matrix, and the rotation and translation vectors to generate the perspective matrix for this call.
Essentially I want to do the process described here (see especially the images towards the end) but starting with a known camera model and pose.
Is there a straightforward function call that takes the camera intrinsic and extrinsic parameters and computes the perspective matrix for use in warpPerspective?
I'll be calling warpPerspective after having called undistort on the image.
In principle, I could derive the solution by solving the system of equations defined at the top of the opencv camera calibration documentation after specifying the constraint Z=0, but I figure that there must be a canned routine that will allow me to orthorectify my test images.
In my seearches, I'm finding it hard to wade through all of the stereo calibration results -- I only have one camera, but want to rectify the image under the constraint that I'm only looking a a planar test pattern.

Actually there is no need to involve an orthographic camera. Here is how you can get the appropriate perspective transform.
If you calibrated the camera using cv::calibrateCamera, you obtained a camera matrix K a vector of lens distortion coefficients D for your camera and, for each image that you used, a rotation vector rvec (which you can convert to a 3x3 matrix R using cv::rodrigues, doc) and a translation vector T. Consider one of these images and the associated R and T. After you called cv::undistort using the distortion coefficients, the image will be like it was acquired by a camera of projection matrix K * [ R | T ].
Basically (as #DavidNilosek intuited), you want to cancel the rotation and get the image as if it was acquired by the projection matrix of form K * [ I | -C ] where C=-R.inv()*T is the camera position. For that, you have to apply the following transformation:
Hr = K * R.inv() * K.inv()
The only potential problem is that the warped image might go outside the visible part of the image plane. Hence, you can use an additional translation to solve that issue, as follows:
[ 1 0 | ]
Ht = [ 0 1 | -K*C/Cz ]
[ 0 0 | ]
where Cz is the component of C along the Oz axis.
Finally, with the definitions above, H = Ht * Hr is a rectifying perspective transform for the considered image.

This is a sketch of what I mean by "solving the system of equations" (in Python):
import cv2
import scipy # I use scipy by habit; numpy would be fine too
#rvec= the rotation vector
#tvec = the translation *emphasized text*matrix
#A = the camera intrinsic
def unit_vector(v):
return v/scipy.sqrt(scipy.sum(v*v))
(fx,fy)=(A[0,0], A[1,1])
Ainv=scipy.array( [ [1.0/fx, 0.0, -A[0,2]/fx],
[ 0.0, 1.0/fy, -A[1,2]/fy],
[ 0.0, 0.0, 1.0] ], dtype=scipy.float32 )
R=cv2.Rodrigues( rvec )
Rinv=scipy.transpose( R )
u=scipy.dot( Rinv, tvec ) # displacement between camera and world coordinate origin, in world coordinates
# corners of the image, for here hard coded
pixel_corners=[ scipy.array( c, dtype=scipy.float32 ) for c in [ (0+0.5,0+0.5,1), (0+0.5,640-0.5,1), (480-0.5,640-0.5,1), (480-0.5,0+0.5,1)] ]
scene_corners=[]
for c in pixel_corners:
lhat=scipy.dot( Rinv, scipy.dot( Ainv, c) ) #direction of the ray that the corner images, in world coordinates
s=u[2]/lhat[2]
# now we have the case that (s*lhat-u)[2]==0,
# i.e. s is how far along the line of sight that we need
# to move to get to the Z==0 plane.
g=s*lhat-u
scene_corners.append( (g[0], g[1]) )
# now we have: 4 pixel_corners (image coordinates), and 4 corresponding scene_coordinates
# can call cv2.getPerspectiveTransform on them and so on..

Related

how to match rgb image pixels with corresponding pointcloud points

I have a color image, corresponding point cloud captured by oak-D camera(see the image below) and i want to get the information of pixels in the color image and corresponding point cloud value in point cloud.
how can i get this information? for instance, i have a pixel value (200,250) in the color image and how to know the corresponding point value in the point cloud?
any help would be appreciated.
It sounds like you want to project a 2D image to a 3D point cloud using the computed disparity map. To do this you will also need to know about your camera intrinsics. Since you are using the oak-D, you should be able to get everything you need with the following piece of code.
with dai.Device(pipeline) as device:
calibData = device.readCalibration()
# get right intrinsic matrix
w, h = monoRight.getResolutionSize()
K_right = calibData.getCameraIntrinsics(dai.CameraBoardSocket.RIGHT, dai.Size2f(w, h))
# get left intrinsic matrix
w, h = monoLeft.getResolutionSize()
K_left = calibData.getCameraIntrinsics(dai.CameraBoardSocket.LEFT, dai.Size2f(w, h))
R_left = calibData.getStereoLeftRectificationRotation()
R_right = calibData.getStereoRightRectificationRotation()
x_baseline = calibData.getBaselineDistance()
Once you have all you camera parameters, you should be able to use opencv to approach this.
First you will need to construct the Q matrix (or rectified transformation matrix).
You will need to provide
The left and right intrinsic calibration matrices
The Translation vector from the coordinate system of the first camera to the second camera
The Rotation matrix from the coordinate system of the first camera to the second camera
Here's a coded example:
import numpy as np
import cv2
Q = np.zeros((4,4))
cv2.stereoRectify(cameraMatrix1=K_left, # left intrinsic matrix
cameraMatrix2=K_right, # right intrinsic matrix
distCoeffs1=0,
distCoeffs2=0,
imageSize=imageSize, # pass in the image size
R=R_left, # Rotation matrix from camera 1 to camera 2
T=x_baseline, # Translation matrix from camera 1 to camera 2
R1=None,
R2=None,
P1= None,
P2= None,
Q=Q);
Next you will need to reproject the image to 3D, using the known disparity map and the Q matrix. The operation is illustrated below, but opencv makes this much easier.
xyz = cv2.reprojectImageTo3D(disparity, Q)
This will give you an array of 3D points. This array specifically has the shape: (rows, columns, 3), where the 3 corresponds the (x,y,z) coordinate of the point cloud. Now you can use the a pixel location to index into xyz and find it's corresponding (x, y, z) point.
pix_row = 200
pix_col = 250
point_cloud_coordinate = xyz[pix_row, pix_col, :]
See the docs for more details
cv2.stereoRectify()
cv2.reprojectImageTo3D()

One single point wrongly triangulated (calibration with a 4 points board)

I'm filming with 6 RGB cameras a scene that I want to reconstruct in 3D, kind of like in the following picture. And I forgot to take a calibration chessboard. So I used a blank rectangle board instead and filmed it, as I would film a regular chessboard.
First step, calibration --> OK.
I obviously couldn't use cv2.findChessboardCorners, so I made a small program that would allow me to click and store the location of each 4 corners. I calibrated from these 4 points for about 10-15 frames as a test.
Tl;Dr: It seemed to work great.
Next step, triangulation. --> NOT OK
I use direct linear transform (DLT) to triangulate my points from all 6 cameras.
Tl;Dr: It's not working so well.
Image and world coordinates are connected this way: ,
which can be written .
A singular value decomposition (SVD) gives
3 of the 4 points are correctly triangulated, but the blue one that should lie on the origin has a wrong x coordinate.
WHY?
Why only one point, and why only the x coordinate?
Does it have anything to do with the fact that I calibrate from a 4 points board?
If so, can you explain; and if not, what else could it be?
Update: I tried for an other frame while the board is somewhere else, and the triangulation is fine.
So there is the mystery: some points are randomly triangulated wrong (or at least the one at the origin), while most of the others are fine. Again, why?
My guess is that it comes from the triangulation rather than from the calibration, and that there is no connexion with my sloppy calibration process.
One common issue I came across is the ambiguity in the solutions found by DLT. Indeed, solving AQ = 0 or solving AC C-¹Q gives the same solution. See page 46 here. But I don't know what to do about it.
I'm now fairly sure this is not a calibration issue but I don't want to delete this part of my post.
I used ret, K, D, R, T = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None). It worked seamlessly, and the points where
perfectly reprojected on my original image with
cv2.projectPoints(objpoints, R, T, K, D).
I computed my projection matrix P as , and R, _ = cv2.Rodrigues(R)
How is it that I get a solution while I have only 4 points per image?
Wouldn't I need 6 of them at least? We have .We
can solve P by SVD under the form This is 2
equations per point, for 11 independent unknown P parameters. So 4
points make 8 equations, which shouldn't be enough. And yet
cv2.calibrateCamera still gives a solution. It must be using
another method? I came across Perspective-n-Point (PnP), is it what
opencv uses? In which case, is it directly optimizing K, R, and T and
thus needs less points?I could artificially add a few points
to get more than the 4 corner points of my board (for example, the
centers of the edges, or the center of the rectangle). But is it
really the issue?
When calibrating, one needs to decompose the projection matrix into
intrinsic and extrinsic matrices. But this decomposition is not
unique and has 4 solutions. See there section 'I'm seeing
double' and Chapt.21 of Hartley&Zisserman about Cheirality
for more information. It is not my issue since my camera points
are correctly reprojected to the image plane and my cameras are
correctly set up on my 3D scene.
I did not quite understand what you are asking, it is rather vague. However, I think you are miscalculating your projection matrix.
if I'm not mistaken, you will surely define 4 3D points representing your rectangle in real world space in this way for example:
pt_3D = [[ 0 0 0]
[ 0 1 0]
[ 1 1 0]
[ 1 0 0]]
you will then retrieve the corresponding 2D points (in order) of each image, and generate two vectors as follows:
objpoints = [pt_3D, pt_3D, ....] # N times
imgpoints = [pt_2D_img1, pt_3D_img2, ....] # N times ( N images )
You can then calibrate your camera and recover the camera poses as well as the projection matrices as follows:
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None)
cv2.projectPoints(objpoints, rvecs, tvecs, K, dist)
for rvec, tvec in zip(rvecs, tvecs):
Rt, _ = cv2.Rodrigues(rvec)
R = Rt.T
T = - R # tvec
pose_Matrix = np.vstack(( np.hstack((R,T)) , [0, 0, 0, 1])) ( transformation matrix == camera pose )
Projection_Matrix = K # TransformationMatrix.T[:3, :4]
You don't have to apply the DLT or the triangulation (all is done in the cv2.calibrateCamera () function, and the 3D points remain what you define yourself

extrinsic matrix computation with opencv

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)
Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.
However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?
Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):
The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).
In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:
|Xc| |X|
|Yc|= R |Y| + T
|Zc| |Z|
That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).
Now, the answer. To obtain R and T, you can do the following:
Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.
Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.
Take a picture and get the corresponding 2D image coordinates.
Use solvePnP to obtain the rotation and translation, with the following parameters:
objectPoints: the 3D points in the world reference frame.
imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
cameraMatris: the intrinsic matrix you already have.
distCoeffs: the distortion coefficients you already have.
rvec, tvec: these will be the outputs.
useExtrinsicGuess: false
flags: you can use CV_ITERATIVE
Finally, get R from rvec with the Rodrigues function.
You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).
And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:
Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.
Recommended: redo the intrinsic calibration with the right scale, something like:
objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)
Where size_of_a_square is the real size of a square.
(Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.

what Orthogonal Matrix is to do with image processing?

I am trying to map mathematics to image processing. I am very much beginner in Math. I read what is orthogonal matrix is from this link : http://people.revoledu.com/kardi/tutorial/LinearAlgebra/MatrixOrthogonal.html
How can i relate this orthogonal matrix to image processing,or any other application of this orthogonal.
I wouldn't spend too much time thinking about it. Nonetheless.
The main place where we see orthogonal matrices in graphics/image-processing is rotation matrices. For example, if I want to rotate an image by t degrees, I would (conceptually) transform that image by the matrix
[ cos(t) , -sin(t) ]
[ sin(t) , cos(t) ]
Meaning I would take a point [ x, y ] in my image, and transform it by that matrix to obtain [ x', y' ], the new location of that point. Were I to do this for every pixel in the image and place the new points on a blank canvas (that could fit them, obviously), I would see (roughly) the image rotated by t radians.
This is an example of an orthogonal matrix. To 'undo' the operation, I transform with the inverse of the matrix. But since this matrix is orthogonal, the inverse is just the transpose:
[ cos(t) , sin(t) ]
[ -sin(t) , cos(t) ]
If I applied that transform to my rotated image, I'd obtain the original (I'm glossing over details like filtering). That matrix represents the inverse operation of rotating by t radians. It is, in fact, a matrix for rotating by -t radians.
In this case it's actually very easy to see that: if you plug -t into the first matrix, since sin is an odd function and cos an even, you'll get exactly the transpose.
This is conceptually very simple compared to a general matrix inversion. We like orthogonal matrices because of that inverse-is-transpose property.
Just one practical example.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Resources