Is there a reverse function of lookat for glMatrix? - webgl

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?

I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.

Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.

Related

Why do we need to convert Rotational vector to Rotational matrix to calculate angle

I am not getting any proper source to understand why we need to change rotational vector to rotational matrix [in the context of calculating angle between two ARUco markers].
We are using
rmat = cv2.Rodrigues(rvec)
rmat1 =cv2.Rodrigues(rvec1)
relative_rmat = rmat1#rmat.T
My questions are
why are we converting the rotational vector to rotational matrix
And can I please get the source of relative_rmat's formula. I am tryna understand the geometrical concept
I have tried to understand from Wikipedia. But I am getting more confused. It would be helpful if anyone can provide the source of the concept for both the questions
Translation of a rigid body (or tvec as used in OpenCV conventions) lives in 3D Euclidean space. We call this the 'configuration space'.
Assume there is a rigid body at a point we call pos1. A 3x1 vector pos1 = [x1, y1, z1] completely defines its position uniquely. The term 'unique' means that there is no other way to define pos1 without [x1, y1, z1], and also if you go to [x1, y1, z1], it will always be pos1.
Any attitude (a rotation of a rigid body around three-dimensional space) can be represented in many different ways. However, attitudes live in a place called 'Special Orthogonal Space or SO(3)'. This is the configuration space for rotations, and elements in this space are what you were referring to as 'rotation matrices'.
All other ways of defining a rotation like Euler angles, rotation vectors (rvec in OpenCV), or quaternions are 'local parameterizations' of a given rotation matrix. So they have several issue, including not being unique at some rotations. Gimbal Lock wiki page has some nice visualizations.
To avoid such issues, the simplest way is to use rotation matrices. Even though it seems complex, rotation matrices can be so much easier to work with once you get used to it. Given the properties of SO(3), the inverse of a rotation matrix is the transpose of that matrix (which is why you get the rmat.T when you are trying to get the relative rotation).
Let's assume that you have two markers named 'marker' and 'marker1' that corresponds to rvec and rvec1 respectively. Your rmat is the rotation of 'marker' with respect to the camera frame, or how a vector in 'marker' can be represented in the camera frame (I know this can be confusing, but this is how this is defined, so stay with me).
Similarly, rmat1 is how a vector in 'marker1' is represented in the camera frame. Also, keep in mind that these matrices are directional, meaning we need to know inverse(rmat1) if we need to find the how a vector in camera frame is represented in the 'marker1' frame.
Your relative_rmat is how you represent a vector in 'marker' in 'marker1'. You cannot randomly hop on to different markers, and always need to go through a common place. First you have to transform a vector in 'marker' to camera frame, and then transform that to the 'marker1' frame. We can write that as
relative_rmat = rmat1 # inverse(rmat)
But as I said above, a special property of a rotation matrix dictates that its inverse is as same as its transpose. So we write it as:
relative_rmat = rmat1 # rmat.T
The order matter here and you should always start with the first rotation and pre-multiply subsequent rotations. If you want to go the other way, you just need to take the inverse of the relative_rmat. So with simple matrix math, we can see that it as same as if we follow the rotations as I described above manually.
relative_rmat_inverse
= relative_rmat.T
= (rmat1 # rmat.T).T
= (rmat.T).T # rmat1.T
= rmat # rmat1.T
It is hard to detail everything here, and it can take a while to understand the math behind the rotation matrices. I would recommend this as a good reference, but it can get very technical depending on your background. If you are new to this, start with basics of robotics, coordinate transformations, and then move to SO(3).

Find the Transformation Matrix that maps 3D local coordinates to global coordinates

I'm coding a calibration algorithm for my depth-camera. This camera outputs an one channel 2D image with the distance of every object in the image.
From that image, and using the camera and distortion matrices, I was able to create a 3D point cloud, from the camera perspective. Now I wish to convert those 3D coordinates to a global/world coordinates. But, since I can't use any patterns like the chessboard to calibrate the camera, I need another alternative.
So I was thinking: If I provide some ground points (in the camera perspective), I would define a plane that I know should have the Z coordinate close to zero, in the global perspective. So, how should I proceed to find the transformation matrix that horizontalizes the plane.
Local coordinates ground plane, with an object on top
I tried using the OpenCV's solvePnP, but it didn't gave me the correct transformation. Also I thought in using the OpenCV's estimateAffine3D, but I don't know where should the global coordinates be mapped to, since the provided ground points do not need to lay on any specific pattern/shape.
Thanks in advance
What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.
To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:
Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
Easier explanation: http://nghiaho.com/?page_id=671
Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_
So, if you want to transform 3D points from the camera reference frame, do the following:
As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.
Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:
q = R.dot(p)+t
EDIT: answer when 3D position of points in world are unspecified
Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.
Here is a possible solution:
Take the selected 3D points in camera reference frame, let's call them q'i.
Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
Use these values of pi and qi to estimate R and t, as in the first part of the answer!
Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.

Camera projection for lines orthogonal to camera z-axis

I'm working on an object tracking application using openCV. I want to convert my pixel coordinates to world coordinates to get more meaningful information. I have read a lot about computing the perspective transform matrix, and I know about cv2.solvePnP. But I feel like my case should be special, because I'm tracking a runner on a track and field runway with the runway orthogonal to the camera's z-axis. I will set up the camera to ensure this.
If I just pick two points on the runway edge, I can calculate a linear conversion from pixels to world coords at that specific height (ground level) and distance from the camera (i.e. along that line). Then I reason that the runner will run on a line parallel to the runway at a different height and slightly different distance from the camera, but the lines should still be parallel in the image, because they will both be orthogonal to the camera z-axis. With all those constraints, I feel like I shouldn't need the normal number of points to track the runner on that particular axis. My gut says that 2-3 should be enough. Can anyone help me nail down the method here? Am I completely off track? With both height and distance from camera essentially fixed, shouldn't I be able to work with a much smaller set of correspondences?
Thanks, Bill
So, I think I've answered this one myself. It's true that only two correspondence points are needed given the following assumptions.
Assume:
World coordinates are set up with X-axis and Y-axis parallel to the ground plane. X-axis is parallel to the runway.
Camera is translated and possibly rotated about X-axis (angled downward), but no rotation around Y-axis(camera plane parallel to runway and x-axis) or Z-axis (camera is level with respect to ground).
Camera intrinsic parameters are known from camera calibration.
Method:
Pick two points in the ground plane with known coordinates in world and image. For example, two points on the runway edge as mentioned in original post. The line connecting the poitns in world coordinates should not be parallel with either X or Z axis.
Since Y=0 for these points, ignore the second column of the rotation/translation matrix, reducing the projection to a planar homography transform (3x3 matrix). Now we have 9 degrees of freedom.
The rotation assumptions will enforce a certain form on the rotation/translation matrix. Namely, the first column and first row will be the identity (1,0,0). This further reduces the number of degrees of freedom in the matrix to 5.
Constrain the values of the second column of the matrix such that cos^2(theta)+sin^2(theta) = 1. This reduces the number of unknowns to only 4. Two correspondence points will give us the 4 equations we need to calculate the homography matrix for the ground plane.
Factor out the camera intrinsic parameter matrix from the homography matrix, leaving the rotation/translation matrix for the ground plane.
Due to the rotation assumptions made earlier, the ignored column of the rotation/translation matrix can be easily constructed from the third column of the same matrix, which is the second column in the ground plane homography matrix.
Multiply back out with the camera intrinsic parameters to arrive at the final universal projection matrix (from only 2 correspondence points!)
My test implentation has worked quite well. Of course, it's sensitive to the accuracy of the two correspondence points provided, but that's kind of a given.

Marker Tracking + perspective warp of marker

I'm tracking a marker with ARToolKit+. I receive a model view matrix that looks about right. Now I'd like to warp the image in a way that the marker looks just like it would look if I looked straight at it. But whatever I do, the result looks just extremely distorted. I know that ARToolKit stores the 4x4 matrix in column major order, so I fixed that for OpenCV.
What I tried so far was:
1) fix the order to row major order
2) calculate the inverse with cvInverse (although transposing the 3x3 rotation part + inverting the translation should suffice)
3) use that matrix with cvPerspectiveWarp
Am I doing something wrong?
tl;dr:
I want this: https://www.youtube.com/watch?v=qZ-LU-C2p2Q
I get some distorted lines and lots of black instead.
Your problem is in converting from 4x4 to 3x3. The short answer is that you want to drop the 3rd column and bottom row to make the 3x3 and then premultiply with your camera matrix. For a longer explanation see here
Clarification
The pose you get from ARTK represents a transform from one place to another. When I say "the initial image appears without rotation" I meant that your transform goes from an initial state which has no rotation about the x or y axis to the current state. That is a fine assumption for most augmented reality applications, I mentioned it just to be thorough.
As for why you can drop the 3rd column. Since you are transforming a plane, your z coordinate can be completely expressed by your x and y coordinates given the equation of your plane. If we assume that initially there is no rotation then your initial z coordinate is a constant value. If there is rotation then z is not constant but it varies deterministically in x and y according to its plane equation which can still be expressed in one matrix (though you don't need that). Since in your case your 4x4 transform is probably expressing the transform from the marker lying flat at z = 0 to its current position, the 3rd column of your 4x4 matrix does nothing (it all gets multiplied by 0) so it can be dropped without affecting the result.
In short: Forget about the rotation stuff, its more complicated than you need, just realize that the transform is from initial coordinates to final coordinates and your initial coordinates are always
[x,y,0,1]
which makes your third column irrelevant.
Update
I'm sorry! I just re-read your question and realized you just want to warp the marker so it looks like a straight on view, I got caught up in describing a general transform from 4x4 to 3x3. The 4x4 transform you get from ARTK is not the transform that will de warp the warker, it is the transform that moves the marker from the origin to its final position. To de warp the marker like you asked the process is similar but would be slightly different. I haven't done that before but here is my guess.
First, you need to get the 4x4 transform between where the marker is in world space, and where you would like it to appear to be after warping it. Right now the transform goes from the origin to the marker location. To change the transform to go from some point farther down on the z axis (say 100) to the marker location define the transform.
initial_marker_pose = [1,0,0,0
0,1,0,0
0,0,1,100
0,0,0,1];
Now you have the transform from the origin to what you want as your "inital" position, and the transform from the origin to your "final" position. To get the transform from initial to final simply
initial_to_final = origin_to_marker*initial_marker_pose.inv();
Now you would follow the process outlined in the link I gave you, in this case your initial zpos is no longer 0, it is 100. Then when you are finished you will need to invert your 3x3 matrix. That is because this process takes you from a straight on view to the one defined by the pose from ARTK and you want the opposite of that. You will need to experiment with the initial z position. The smaller it is, the larger your marker will appear after de-warping.
Hopefully that works, sorry for the confusion about your question.

plane at infinity in projective space from intrinsic parameters

Suppose cameras are calibrated therefore Metric projection matrices M_i(3x4) are there for view i from multiple views. As well, K_i(3x3) the camera matrix of each view is available. Can we calculate the location of plane at infinity in projective space?
Sure, the plane at infinity is always the plane where w = 0. If you are applying affine transformations, it remains fixed. It only shifts if you use a homography.
Yes, it is theoretically possible. The plane at infinity always remains fixed in terms of the actual projective 3D world. However, it is imaged differently by a moving camera upon each view and in these cases we say that the plane at infinity is not at its canonical position. Instead of thinking that the camera moved, it is more convenient to think that the entire 3D projective space has moved! Thus, we invent a 3D homography to "blame" for this change. Mathematically, the 3D homography tags along the left side of the projection matrix:
x = (P*H)*X
So, to answer the question: Although tricky, yes you may recover it, given that you have sufficiently enough reconstructed views of the scene. This is a process called auto calibration and it essentially involves one equation (that comes in many flavors) but unfortunately, gives non-linear equations. I suggest you look at the following:
http://nguyendangbinh.org/Proceedings/CVPR/1999/DATA/03_34.PDF
I believe it contains the most up-to-date method to iteratively compute the plane at infinity.

Resources