OpenCV Stereo Photogrammetry- why my Z axis not in line with the principal point? - opencv

As I understand OpenCV's coordinate system, as in this diagram.
The left camera of a calibrated stereo pair is located at the origin facing the Z direction.
I have a pair of 2464x2056 pixel cameras that I have calibrated (with a stereo rms of around 0.35), computed the disparity on a pair of images and reprojected this to get the 3D pointcloud. However, I've noticed that the Z axis is not in line with the optical centre of the camera.
This does kind of mess with some of the pointcloud manipulation I'm hoping to do- is this expected, or does it indicate that that something has gone wrong along the way?
Below is the point I've generated, plus the axis- the red green and blue lines indicate the x,y and z axes respectively, coming out from the origin.
As you can see, the Z axis intercepts the pointcloud between the head and the post- this corresponds to a pixel coordinate of approximately x = 637, y = 1028 when I fix the principal point during calibration to cx = 1232,y=1028. When I remove the CV_FIX_PRINCIPAL_POINT flag, this is calculated as approximatly cx = 1310, cy=1074, and the Z axis intercepts at around x=310,y=1050.
Compared to the rectified image here where the midpoint x = 1232,y=1028 is marked by a yellow cross, the centre of the image is over the mannequin had, the intersection between the Z axis is significantly off from where I would expect.
Does anyone have any idea as to why this could be occuring? Any help would be greatly appreciated.

Related

What is the point of reference / origin for coordinates obtained from a stereo-set up? (OpenCV)

I set up a stereo-vision system to triangulate 3D point given two 2D points from 2 views (corresponding to the same point.) I have some questions on the interpretability of the results.
So the size of my calibration squares are '25mm' a side, and after triangulating and normalizing the homogeneous coordinates (dividing the array of points by the fourth coordinate), I multiply all of them by 25mm and divide by 10 (to get in cm) to get the actual distance from the camera set up.
For eg - the final coordinates that I got were something like ([-13.29, -5.94, 68.41]) So how do I interpret this? 68.41 is the distance in the z direction, and -5.94 is the position in the y and -13.29 is the position in the x. But what is the origin here? By convention is it the left camera? Or is it the center of the epipolar baseline? I am using OpenCV for reference.

Find world space coordinate for pixel in OpenCV

I need to find the world coordinate of a pixel using OpenCV. So when I take pixel (0,0) in my image (that's the upper-left corner), I want to know to what 3D world space coordinate this pixel corresponds to on my image plane. I know that a single pixel corresponds to a line of 3D points in world space, but I want specific the one that lies on the image plane itself.
This is the formula of the OpenCV Pinhole model of which I have the first (intrinsics) and second (extrinsics) matrices. I know that I have u and v, but I don't know how to get from this u and v to the correct X, Y and Z coordinate.
What I've tried already:
I thought to just set s to 1 and make a homogeneous coordinate from [u v 1]^T by adding a 1, like so: [u v 1 1]^T. Then I multiplied the intrinsics with the extrinsics and made it into a 4x4 matrix by adding the following row: [0 0 0 1]. This was then inverted and multiplied with [u v 1 1]^T to get my X, Y and Z. But when I checked if four pixels calculated like that lay on the same plane (the image plane), this was wrong.
So, any ideas?
IIUC you want the intersection I with the image plane of the ray that back-projects a given pixel P from the camera center.
Let's define the coordinate systems first. The usual OpenCV convention is as follows:
Image coordinates: origin at the top-left corner, u axis going right (increasing column) and v axis going down.
Camera coordinates: origin at the camera center C, z axis going toward the scene, x axis going right and y axis going downward.
Then the image plane in camera frame is z=fx, where fx is the focal length measured in pixels, and a pixel (u, v) has camera coordinates (u - cx, v - cy, fx).
Multiply them by the inverse of the (intrinsic) camera matrix K you'll get the same point in metrical camera coordinates.
Finally, multiply that by the inverse of the world-to-camera coordinate transform [R | t] and you'll get the same point in world coordinates.

calculate the real distance between to point using image

I do some image processing task in 3D and I have a problem.
I use a simulator which provides me an special kind of cameras which can tell the distance between the position of camera and any arbitrary point, using the pixels of that point in the image of camera. For example I can get the distance between camera and the object which is placed in pixel 21:34.
Now I need to calculate the real distance between two arbitrary pixels in the image of camera.
It is easy when camera is vertical and placed on the above of the field and all objects are on the ground but when camera is horizontal the depth of objects in image is different.
So, how should I do?
Simple 3D reconstruction will accomplish this. The distance from camera to points in 3D is along optical axis that is Z, which you already have. You will need X, Y as well:
X = u*Z/f;
Y = v*Z/f,
where f is camera focal length in pixels, Z your distance in mm or meters and u,v is an image centered coordinates: u = column-width/2, v = height/2-row. Note the asymmetry due to the fact that rows go down while Y and v go up. As soon as you get your X, Y, Z the distance in 3D is given by Euclidean formula:
dist = sqrt((X1-X2)2+(Y1-Y2)2+(Z1-Z2)2)

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

The angle between an object and Kinect's optic axis

Here's my Setup: Kinect mounted on an actuator for horizontal movement.
Here's a short demo of what I am doing. http://www.youtube.com/watch?v=X1aSMvDQhDM
Here's my Scenario:
Please refer to above figure. Assume the distance between the center of the Actuator,'M', and the Center of the optic axis of Kinect, 'C', is 'dx'(millimeters), the depth information 'D'(millimeters) obtained from Kinect is relative to the optic axis. Since I now have a actuator mounted onto the Center of Kinect, the actual depth between object and Kinect is 'Z'.
X is the distance between optical axis and object, in pixels. Theta2 is the angle between optic axis and object. 'dy' can be ignored.
Here's my Problem.
To obtain Z, I can simply use the distance equation in Figure 2. However I do not know the real world value of X in mm. If I have the angle between the object and optical axis 'theta2', I could use Dsin(theta2) to obtain X in mm. However theta2 is also unknown. Since if X (in mm) is know, I can get theta2, if theta2 is known, I can get X. So how should I obtain either the X value in mm or the angle between optic axis and Object P?
Here's what I've tried:
Since I know the max field of view for Kinect is 57degrees, and the max horizontal resolution of Kinect is 640pixels, I can say that 1 degree for kinect covers 11.228 (640/57) pixels. However, through experiments I discover that this results in error of at least 2 degrees. I suspect its due to lens distortion on the Kinect. But I don't know how to compensate/normalize it.
Any ideas/helps are greatly appreciated.

Resources