Align RGB image to Depth Image using Intrinsic and Extrinsic Matrix - image-processing

similar questions are solved many times. However, they generally maps depth coordinates to RGB coordinates, by following the next steps:
apply the inverse depth intrinsic matrix to the depth coordinates.
rotate and translate the 3d coordinates obtained using the rotation R and T matrixes that maps 3d depth coordinates to 3D RGB coordinates.
apply the RGB intrinsic matrix to obtain the image coordinates.
However, I want to do the reverse process. From a RGB coordinates obtain the depth coordinates. Then I can obtain an interpolated value from the depth map based on those coordinates.
The problem is that I don't know how can I define the z coordinate in the RGB image to make everything works.
The process should be:
obtain 3D RGB coordinates by applying the camera's inverse intrinsic matrix. How can I set the z coordinates? Should I define and estimated value? Set all the z coordinates to one?
rotate and translate the 3D RGB coordinates to the 3d coordinates.
apply the depth intrinsic matrix.
If this process cannot be done. How can I map RGB coordinates to depth coordinates instead of the other way around?
Thank you!

Related

Aruco Cube Pose Estiamtion

I want to detect pose of the Cube (see image). I have the coordinates of centers of all markers and I want to draw axis on the origin/ or outline of the whole cube using 2D detection points and 3D object points. I am using opencv and I tried to draw the axis, on the image but detected axis are not correct.
Aruco cube
I am using cv2.solvePnP, to get the rotation and translation vectors from detected 2D-3D point pairs.
Detected axis
for 3D object points Origin is at the one of the corners of the cube. The size of the cube is 20x20x20 cm.
Is it possible to do with opencv? Are there any other python libraries which could be useful to achieve the desired results?
TIA

How to estimate intrinsic properties of a camera from data?

I am attempting camera calibration from a single RGB image (panorama) given 3D pointcloud
The methods that I have considered all require an intrinsic properties matrix (which I have no access to)
The intrinsic properties matrix can be estimated using the Bouguet’s camera calibration Toolbox, but as I have said, I have a single image only and a single point cloud for that image.
So, knowing 2D image coordinates, extrinsic properties, and 3D world coordinates, how can the intrinsic properties be estimated?
It would seem that the initCameraMatrix2D function from the OpenCV (https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html) works in the same way as the Bouguet’s camera calibration Toolbox and requires multiple images of the same object
I am looking into the Direct linear transformation DLT and Levenberg–Marquardt algorithm with implementations https://drive.google.com/file/d/1gDW9zRmd0jF_7tHPqM0RgChBWz-dwPe1
but it would seem that both use the pinhole camera model and therefore find linear transformation between 3D and 2D points
I can't find my half year old source code, but from top of my head
cx, cy is optical centre which is width/2, height/2 in pixels
fx=fy is focal length in pixels (distance from camera to image plane or axis of rotation)
If you know that image distance from camera to is for example 30cm and it captures image that has 16x10cm and 1920x1200 pixels, size of pixel is 100mm/1200=1/12mm and camera distance (fx,fy) would be 300mm*12px/1mm=3600px and image centre is cx=1920/2=960, cy=1200/2=600. I assume that pixels are square and camera sensor is centered at optical axis.
You can get focal lenght from image size in pixels and measured angle of view.

Difference between undistortPoints() and projectPoints() in OpenCV

From my understanding, undistortPoints takes a set of points on a distorted image, and calculates where their coordinates would be on an undistorted version of the same image. Likewise, projectPoints maps a set of object coordinates to their corresponding image coordinates.
However, I am unsure if projectPoints maps the object coordinates to a set of image points on the distorted image (ie. the original image) or one that has been undistorted (straight lines)?
Furthermore, the OpenCV documentation for undistortPoints states that 'the function performs a reverse transformation to projectPoints()'. Could you please explain how this is so?
Quote from the 3.2 documentation for projectPoints():
Projects 3D points to an image plane.
The function computes
projections of 3D points to the image plane given intrinsic and
extrinsic camera parameters.
You have the parameter distCoeffs:
If the vector is empty, the zero distortion coefficients are assumed.
With no distorsion the equation is:
With K the intrinsic matrix and [R | t] the extrinsic matrix or the transformation that transforms a point in the object or world frame to the camera frame.
For undistortPoints(), you have the parameter R:
Rectification transformation in the object space (3x3 matrix). R1 or R2 computed by cv::stereoRectify can be passed here. If the matrix is empty, the identity transformation is used.
The reverse transformation is the operation where you compute for a 2D image point ([u, v]) the corresponding 3D point in the normalized camera frame ([x, y, z=1]) using the intrinsic parameters.
With the extrinsic matrix, you can get the point in the camera frame:
The normalized camera frame is obtained by dividing by the depth:
Assuming no distortion, the image point is:
And the "reverse transformation" assuming no distortion:

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Is it possible to obtain fronto-parallel view of image or camera position by 2d-3d points relation?

Is it possible to obtain front-parallel view of image or camera position by 2d-3d points relation using OpenCV?
For this I have intrinsic and extrinsic parameters. I have also 3d coordinates of set of control points (which lies in one plane) on image (relation 2d-3d).
In fact I need location and orientation of camera, but it is not difficult to find it if I can convert image to fronto-parallel view.
If it is not possible to do with OpenCV, are the other libraries which can solve this task?
Solution is based on the formulas in the OpenCV documentation Camera Calibration and 3D Reconstruction
Let's consider numerical form without distortion coefficient (in contrast with matrix form).
We have u and v.
It is easy to calculate x' and y'.
But x and y can not be calculate because we can choose any non-zero z.
Line in 3d corresponds to one point in 2d image.
To solve this we take two points for z=1 and z=2.
Then we find 2 points in 3d space which specify line (x1,y1,z1) and (x2,y2,z2).
Then we can apply R-1 to (x1,y1,z1) and (x2,y2,z2) which results in line determined by two points (X1, Y1, Z1) and (X1, Y1, Z1).
Since our control points lie in one plane (let plane is Z=0 for simplicity) we can find corresponding X and Y point which is a point in 3d.
After applying normalization from mm to pixels we obtain fronto-parallel image.
(If we have input image distorted we should undistort it first)

Resources