Find world space coordinate for pixel in OpenCV - opencv

I need to find the world coordinate of a pixel using OpenCV. So when I take pixel (0,0) in my image (that's the upper-left corner), I want to know to what 3D world space coordinate this pixel corresponds to on my image plane. I know that a single pixel corresponds to a line of 3D points in world space, but I want specific the one that lies on the image plane itself.
This is the formula of the OpenCV Pinhole model of which I have the first (intrinsics) and second (extrinsics) matrices. I know that I have u and v, but I don't know how to get from this u and v to the correct X, Y and Z coordinate.
What I've tried already:
I thought to just set s to 1 and make a homogeneous coordinate from [u v 1]^T by adding a 1, like so: [u v 1 1]^T. Then I multiplied the intrinsics with the extrinsics and made it into a 4x4 matrix by adding the following row: [0 0 0 1]. This was then inverted and multiplied with [u v 1 1]^T to get my X, Y and Z. But when I checked if four pixels calculated like that lay on the same plane (the image plane), this was wrong.
So, any ideas?

IIUC you want the intersection I with the image plane of the ray that back-projects a given pixel P from the camera center.
Let's define the coordinate systems first. The usual OpenCV convention is as follows:
Image coordinates: origin at the top-left corner, u axis going right (increasing column) and v axis going down.
Camera coordinates: origin at the camera center C, z axis going toward the scene, x axis going right and y axis going downward.
Then the image plane in camera frame is z=fx, where fx is the focal length measured in pixels, and a pixel (u, v) has camera coordinates (u - cx, v - cy, fx).
Multiply them by the inverse of the (intrinsic) camera matrix K you'll get the same point in metrical camera coordinates.
Finally, multiply that by the inverse of the world-to-camera coordinate transform [R | t] and you'll get the same point in world coordinates.

Related

OpenCV Stereo Photogrammetry- why my Z axis not in line with the principal point?

As I understand OpenCV's coordinate system, as in this diagram.
The left camera of a calibrated stereo pair is located at the origin facing the Z direction.
I have a pair of 2464x2056 pixel cameras that I have calibrated (with a stereo rms of around 0.35), computed the disparity on a pair of images and reprojected this to get the 3D pointcloud. However, I've noticed that the Z axis is not in line with the optical centre of the camera.
This does kind of mess with some of the pointcloud manipulation I'm hoping to do- is this expected, or does it indicate that that something has gone wrong along the way?
Below is the point I've generated, plus the axis- the red green and blue lines indicate the x,y and z axes respectively, coming out from the origin.
As you can see, the Z axis intercepts the pointcloud between the head and the post- this corresponds to a pixel coordinate of approximately x = 637, y = 1028 when I fix the principal point during calibration to cx = 1232,y=1028. When I remove the CV_FIX_PRINCIPAL_POINT flag, this is calculated as approximatly cx = 1310, cy=1074, and the Z axis intercepts at around x=310,y=1050.
Compared to the rectified image here where the midpoint x = 1232,y=1028 is marked by a yellow cross, the centre of the image is over the mannequin had, the intersection between the Z axis is significantly off from where I would expect.
Does anyone have any idea as to why this could be occuring? Any help would be greatly appreciated.

Camera projection matrix principal point

I'm a little confused about the purpose of adding the offsets of the principal point, in the camera matrix. These equations are from OpenCV Docs.
I understand all of this except for adding c_x and c_y. I've read that we do this in order to shift the origin of the projected point so that it's relative to (0, 0), the top left of the image. However, I don't know how adding the coordinates of the center of the image (the principal point) accomplishes this. I think it's simple geometry, but I'm having a hard time understanding.
Just take a look at the diagram in your question. The x/y coordinate system has its origin somewhere around the center of the image. I.e., there can be negative coordinates. The u/v coordinate system has its origin at the top left corner, i.e., there can be no negative coordinates. For the purpose of this question, I will consider the x/y coordinate system to already be scaled with fx, fy, i.e., (x, y) = (fx * x', fy * y').
What you want to do is transform the coordinates from the x/y coordinate system to the u/v coordinate system. Let's look at a few examples:
The origin in x/y (0, 0) will map to (cx, cy) in u/v.
The top left corner (i.e., (0, 0) in u/v) has the coordinates (-cx, -cy) in x/y.
You could establish many more examples. They all have in common that (u, v) = (x, y) + (fx, fy). And this is the transform stated in the equations.

How to calculate camera orientation using one point in large distance (using opencv)?

Let's say I have a pinhole camera with known intristic values like camera matrix and distortion coefficients. Let's say there is a point in large enough distance from the camera, so we can say it is placed in infinity.
Given image coordinates of this point in pixels, I would like to calculate camera rotation relative to the axis that connects camera and this point (so rotation is 0,0 if camera is directed at this point and it is in the optical center of the image).
How can this be done using opencv?
Many thanks!
You need to specify an additional constraint - rotating the camera from its current pose to one that aligns the optical axis with an arbitrary ray leaves the camera free to rotate about the ray itself (i.e. it leaves the "roll" angle unspecified).
Let's assume that you want the roll to be zero, i.e. that you want the motion to be a pure pan-tilt. This has a unique solution as long as the ray you want to align to is not parallel to the vertical image axis (in which case pan and roll are the same motion).
Then the solution is computed as follows. Let's use the OpenCV camera frame: Z=[0,0,1]' (, where " ' " means transpose) be the camera focal axis, oriented going out of the lens, Y=[0,1,0]' the vertical axis going down, and X = Z x Y (where 'x' is the cross product) the horizontal camera axis going toward the right of the image. So "pan" is a rotation about Y, "tilt" is a rotation about X.
Let U = [u1, u2, u3]', with || u || = 1 be the ray you want to rotate to. You want to apply a pan that brings Z onto the plane Puy defined by the vectors u and Y, then apply a tilt that brings Z onto u.
The angle of the first rotation is (angle between Z and Puy) = [90 deg - (angle between Z and Y x U)]. this is because Y x U is orthogonal to Puy. Look up the expressions for computing the angle between vectors on Wikipedia or elsewhere online. Once you have the angle (or its cosine and sine), the rotation about Y can be expressed as a standard rotation matrix Ry.
The angle of the second rotation, about X after once Z is onto Puy, is the angle between vector Z and U after Ry is applied to Z, or equivalently, between Z and inv(Ry) * U. Compute the angle between the vector, and use to build a standard rotation matrix about X, Rx
The final transformation is then Rx * Ry.

calculate the real distance between to point using image

I do some image processing task in 3D and I have a problem.
I use a simulator which provides me an special kind of cameras which can tell the distance between the position of camera and any arbitrary point, using the pixels of that point in the image of camera. For example I can get the distance between camera and the object which is placed in pixel 21:34.
Now I need to calculate the real distance between two arbitrary pixels in the image of camera.
It is easy when camera is vertical and placed on the above of the field and all objects are on the ground but when camera is horizontal the depth of objects in image is different.
So, how should I do?
Simple 3D reconstruction will accomplish this. The distance from camera to points in 3D is along optical axis that is Z, which you already have. You will need X, Y as well:
X = u*Z/f;
Y = v*Z/f,
where f is camera focal length in pixels, Z your distance in mm or meters and u,v is an image centered coordinates: u = column-width/2, v = height/2-row. Note the asymmetry due to the fact that rows go down while Y and v go up. As soon as you get your X, Y, Z the distance in 3D is given by Euclidean formula:
dist = sqrt((X1-X2)2+(Y1-Y2)2+(Z1-Z2)2)

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Resources