Camera projection matrix principal point - opencv

I'm a little confused about the purpose of adding the offsets of the principal point, in the camera matrix. These equations are from OpenCV Docs.
I understand all of this except for adding c_x and c_y. I've read that we do this in order to shift the origin of the projected point so that it's relative to (0, 0), the top left of the image. However, I don't know how adding the coordinates of the center of the image (the principal point) accomplishes this. I think it's simple geometry, but I'm having a hard time understanding.

Just take a look at the diagram in your question. The x/y coordinate system has its origin somewhere around the center of the image. I.e., there can be negative coordinates. The u/v coordinate system has its origin at the top left corner, i.e., there can be no negative coordinates. For the purpose of this question, I will consider the x/y coordinate system to already be scaled with fx, fy, i.e., (x, y) = (fx * x', fy * y').
What you want to do is transform the coordinates from the x/y coordinate system to the u/v coordinate system. Let's look at a few examples:
The origin in x/y (0, 0) will map to (cx, cy) in u/v.
The top left corner (i.e., (0, 0) in u/v) has the coordinates (-cx, -cy) in x/y.
You could establish many more examples. They all have in common that (u, v) = (x, y) + (fx, fy). And this is the transform stated in the equations.

Related

OpenCV Stereo Photogrammetry- why my Z axis not in line with the principal point?

As I understand OpenCV's coordinate system, as in this diagram.
The left camera of a calibrated stereo pair is located at the origin facing the Z direction.
I have a pair of 2464x2056 pixel cameras that I have calibrated (with a stereo rms of around 0.35), computed the disparity on a pair of images and reprojected this to get the 3D pointcloud. However, I've noticed that the Z axis is not in line with the optical centre of the camera.
This does kind of mess with some of the pointcloud manipulation I'm hoping to do- is this expected, or does it indicate that that something has gone wrong along the way?
Below is the point I've generated, plus the axis- the red green and blue lines indicate the x,y and z axes respectively, coming out from the origin.
As you can see, the Z axis intercepts the pointcloud between the head and the post- this corresponds to a pixel coordinate of approximately x = 637, y = 1028 when I fix the principal point during calibration to cx = 1232,y=1028. When I remove the CV_FIX_PRINCIPAL_POINT flag, this is calculated as approximatly cx = 1310, cy=1074, and the Z axis intercepts at around x=310,y=1050.
Compared to the rectified image here where the midpoint x = 1232,y=1028 is marked by a yellow cross, the centre of the image is over the mannequin had, the intersection between the Z axis is significantly off from where I would expect.
Does anyone have any idea as to why this could be occuring? Any help would be greatly appreciated.

normalized point coordinates in undistortPoints function

In undistortPoints function from OpenCV, the documentations says that
http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#undistortpoints
where undistort() is an approximate iterative algorithm that estimates the normalized original point coordinates out of the normalized distorted point coordinates (“normalized” means that the coordinates do not depend on the camera matrix).
It seems that the normalized point coordinates is obtained by adding 1 to the third coordinate. What does normalized point coordinates means? How can it be used for?
In the above, there are two lines
x" = (u - cx)/fx
y" = (v - cy)/fy
Is there one term for the coordinates(x'', y'')?
I'm not entirely sure what you mean by "Is there one term for the coordinates (x", y")", but if you mean what do they physically represent, then they are the coordinates of the image point (u, v) on the image plane expressed in the camera coordinate system (origin at the centre of projection, x-axis to the right, y-axis down, z-axis pointing out towards the scene and perpendicular to the image plane), whereas (u,v) are the coordinates of the image point relative to the origin at the top left corner of the image plane (x-axis to the right, y-axis down). All quantities are expressed in pixels.
The output of the undistortPoints function are normalised coordinates, which means that the points returned in the dst parameter have their (x", y") coordinates between 0 and 1 (not shown in the equations you presented, but is the output of the internally called undistort function within undistortPoints).
2D coordinates (whether normalised or not) that have a 1 inserted as the third coordinate are known as homogenous coordinates. The same can be done for 3D coordinates by inserting a 1 into the 4th element. Homogenous coordinates are useful because they allow certain operations to be represented as a simple linear equation whereas their non-homogenous equivalent may not be as straightforward.

Find world space coordinate for pixel in OpenCV

I need to find the world coordinate of a pixel using OpenCV. So when I take pixel (0,0) in my image (that's the upper-left corner), I want to know to what 3D world space coordinate this pixel corresponds to on my image plane. I know that a single pixel corresponds to a line of 3D points in world space, but I want specific the one that lies on the image plane itself.
This is the formula of the OpenCV Pinhole model of which I have the first (intrinsics) and second (extrinsics) matrices. I know that I have u and v, but I don't know how to get from this u and v to the correct X, Y and Z coordinate.
What I've tried already:
I thought to just set s to 1 and make a homogeneous coordinate from [u v 1]^T by adding a 1, like so: [u v 1 1]^T. Then I multiplied the intrinsics with the extrinsics and made it into a 4x4 matrix by adding the following row: [0 0 0 1]. This was then inverted and multiplied with [u v 1 1]^T to get my X, Y and Z. But when I checked if four pixels calculated like that lay on the same plane (the image plane), this was wrong.
So, any ideas?
IIUC you want the intersection I with the image plane of the ray that back-projects a given pixel P from the camera center.
Let's define the coordinate systems first. The usual OpenCV convention is as follows:
Image coordinates: origin at the top-left corner, u axis going right (increasing column) and v axis going down.
Camera coordinates: origin at the camera center C, z axis going toward the scene, x axis going right and y axis going downward.
Then the image plane in camera frame is z=fx, where fx is the focal length measured in pixels, and a pixel (u, v) has camera coordinates (u - cx, v - cy, fx).
Multiply them by the inverse of the (intrinsic) camera matrix K you'll get the same point in metrical camera coordinates.
Finally, multiply that by the inverse of the world-to-camera coordinate transform [R | t] and you'll get the same point in world coordinates.

How to calculate camera orientation using one point in large distance (using opencv)?

Let's say I have a pinhole camera with known intristic values like camera matrix and distortion coefficients. Let's say there is a point in large enough distance from the camera, so we can say it is placed in infinity.
Given image coordinates of this point in pixels, I would like to calculate camera rotation relative to the axis that connects camera and this point (so rotation is 0,0 if camera is directed at this point and it is in the optical center of the image).
How can this be done using opencv?
Many thanks!
You need to specify an additional constraint - rotating the camera from its current pose to one that aligns the optical axis with an arbitrary ray leaves the camera free to rotate about the ray itself (i.e. it leaves the "roll" angle unspecified).
Let's assume that you want the roll to be zero, i.e. that you want the motion to be a pure pan-tilt. This has a unique solution as long as the ray you want to align to is not parallel to the vertical image axis (in which case pan and roll are the same motion).
Then the solution is computed as follows. Let's use the OpenCV camera frame: Z=[0,0,1]' (, where " ' " means transpose) be the camera focal axis, oriented going out of the lens, Y=[0,1,0]' the vertical axis going down, and X = Z x Y (where 'x' is the cross product) the horizontal camera axis going toward the right of the image. So "pan" is a rotation about Y, "tilt" is a rotation about X.
Let U = [u1, u2, u3]', with || u || = 1 be the ray you want to rotate to. You want to apply a pan that brings Z onto the plane Puy defined by the vectors u and Y, then apply a tilt that brings Z onto u.
The angle of the first rotation is (angle between Z and Puy) = [90 deg - (angle between Z and Y x U)]. this is because Y x U is orthogonal to Puy. Look up the expressions for computing the angle between vectors on Wikipedia or elsewhere online. Once you have the angle (or its cosine and sine), the rotation about Y can be expressed as a standard rotation matrix Ry.
The angle of the second rotation, about X after once Z is onto Puy, is the angle between vector Z and U after Ry is applied to Z, or equivalently, between Z and inv(Ry) * U. Compute the angle between the vector, and use to build a standard rotation matrix about X, Rx
The final transformation is then Rx * Ry.

How to transform an image based on the position of camera

I'm trying to create a perspective projection of an image based on the look direction. I'm unexperienced on this field and can't manage to do that myself, however. Will you help me, please?
There is an image and an observer (camera). If camera can be considered an object on an invisible sphere and the image a plane going through the middle of the sphere, then camera position can be expressed as:
x = d cos(θ) cos(φ)
y = d sin(θ)
z = d sin(φ) cos(θ)
Where θ is latitude, φ is longitude and d is the distance (radius) from the middle of the sphere where the middle of the image is.
I found these formulae somwhere, but I'm not sure about the coordinates (I don't know but it looks to me that x should be z but I guess it depends on the coordinate system).
Now, what I need to do is make a proper transformation of my image so it looks as if viewed from the camera (in a proper perspective). Would you be so kind to tell me a few words how this could be done? What steps should I take?
I'm developing an iOS app and I thought I could use the following method from the QuartzCore. But I have no idea what angle I should pass to this method and how to derive the new x, y, z coordinates from the camera position.
CATransform3D CATransform3DRotate (CATransform3D t, CGFloat angle,
CGFloat x, CGFloat y, CGFloat z)
So far I have successfully created a simple viewing perspective by:
using an identity matrix (as the CATransform3D parameter) with .m34 set to 1/-1000,
rotating my image by the angle of φ with the (0, 1, 0) vector,
concatenating the result with a rotation by θ and the (1, 0, 0) vector,
scaling based on the d is ignored (I scale the image based on some other criteria).
But the result I got was not what I wanted (which was obvious) :-/. The perspective looks realistic as long as one of these two angles is close to 0. Therefore I thought there could be a way to calculate somehow a proper angle and the x, y and z coordinates to achieve a proper transformation (which might be wrong because it's just my guess).
I think I managed to find a solution, but unfortunately based on my own calculations, thoughts and experiments, so I have no idea if it is correct. Seems to be OK, but you know...
So if the coordinate system is like this:
and the plane of the image to be transformed goes through the X and the Y axis, and its centre is in the origin of the system, then the following coordinates:
x = d sin(φ) cos(θ)
y = d sin(θ)
z = d cos(θ) cos(φ)
define a vector that starts in the origin of the coordinate system and points to the position of the camera that is observing the image. The d can be set to 1 so we get a unit vector at once without further normalization. Theta is the angle in the ZY plane and phi is the angle in the ZX plane. Theta raises from 0° to 90° from the Z+ to the Y+ axis, whereas phi raises from 0° to 90° from the Z+ to the X+ axis (and to -90° in the opposite direction, in both cases).
Hence the transformation vector is:
x1 = -y / z
y1 = -x / z
z1 = 0.
I'm not sure about z1 = 0, however rotation around the Z axis seemed wrong to me.
The last thing to calculate is the angle by which the image has to be transformed. In my humble opinion this should be the angle between the vector that points to the camera (x, y, z) and the vector normal to the image, which is the Z axis (0, 0, 1).
The dot product of two vectors gives the cosine of the angle between them, so the angle is:
α = arccos(x * 0 + y * 0 + z * 1) = arccos(z).
Therefore the alpha angle and the x1, y1, z1 coordinates are the parameters of CATransform3DRotate method I mentioned in my question.
I would be grateful if somebody could tell me if this approach is correct. Thanks a lot!

Resources