Why do we need to convert Rotational vector to Rotational matrix to calculate angle - opencv

I am not getting any proper source to understand why we need to change rotational vector to rotational matrix [in the context of calculating angle between two ARUco markers].
We are using
rmat = cv2.Rodrigues(rvec)
rmat1 =cv2.Rodrigues(rvec1)
relative_rmat = rmat1#rmat.T
My questions are
why are we converting the rotational vector to rotational matrix
And can I please get the source of relative_rmat's formula. I am tryna understand the geometrical concept
I have tried to understand from Wikipedia. But I am getting more confused. It would be helpful if anyone can provide the source of the concept for both the questions

Translation of a rigid body (or tvec as used in OpenCV conventions) lives in 3D Euclidean space. We call this the 'configuration space'.
Assume there is a rigid body at a point we call pos1. A 3x1 vector pos1 = [x1, y1, z1] completely defines its position uniquely. The term 'unique' means that there is no other way to define pos1 without [x1, y1, z1], and also if you go to [x1, y1, z1], it will always be pos1.
Any attitude (a rotation of a rigid body around three-dimensional space) can be represented in many different ways. However, attitudes live in a place called 'Special Orthogonal Space or SO(3)'. This is the configuration space for rotations, and elements in this space are what you were referring to as 'rotation matrices'.
All other ways of defining a rotation like Euler angles, rotation vectors (rvec in OpenCV), or quaternions are 'local parameterizations' of a given rotation matrix. So they have several issue, including not being unique at some rotations. Gimbal Lock wiki page has some nice visualizations.
To avoid such issues, the simplest way is to use rotation matrices. Even though it seems complex, rotation matrices can be so much easier to work with once you get used to it. Given the properties of SO(3), the inverse of a rotation matrix is the transpose of that matrix (which is why you get the rmat.T when you are trying to get the relative rotation).
Let's assume that you have two markers named 'marker' and 'marker1' that corresponds to rvec and rvec1 respectively. Your rmat is the rotation of 'marker' with respect to the camera frame, or how a vector in 'marker' can be represented in the camera frame (I know this can be confusing, but this is how this is defined, so stay with me).
Similarly, rmat1 is how a vector in 'marker1' is represented in the camera frame. Also, keep in mind that these matrices are directional, meaning we need to know inverse(rmat1) if we need to find the how a vector in camera frame is represented in the 'marker1' frame.
Your relative_rmat is how you represent a vector in 'marker' in 'marker1'. You cannot randomly hop on to different markers, and always need to go through a common place. First you have to transform a vector in 'marker' to camera frame, and then transform that to the 'marker1' frame. We can write that as
relative_rmat = rmat1 # inverse(rmat)
But as I said above, a special property of a rotation matrix dictates that its inverse is as same as its transpose. So we write it as:
relative_rmat = rmat1 # rmat.T
The order matter here and you should always start with the first rotation and pre-multiply subsequent rotations. If you want to go the other way, you just need to take the inverse of the relative_rmat. So with simple matrix math, we can see that it as same as if we follow the rotations as I described above manually.
relative_rmat_inverse
= relative_rmat.T
= (rmat1 # rmat.T).T
= (rmat.T).T # rmat1.T
= rmat # rmat1.T
It is hard to detail everything here, and it can take a while to understand the math behind the rotation matrices. I would recommend this as a good reference, but it can get very technical depending on your background. If you are new to this, start with basics of robotics, coordinate transformations, and then move to SO(3).

Related

Determine rotation of data on surface of sphere

Let's suppose I have two identical data sets from the surface of a sphere, e.g. like in the image below, where one of them is rotated by arbitrary Euler angles.
I want to find out the Euler angles by which the data were rotated with respect to each other.
What is the best (most efficient) way to do that?
Does it make sense to use pattern matching algorithms like this one?
Can this task be done without projecting the sphere surface onto a 2D plane? Such projections unavoidably introduce heavy distortions.
BTW, I'm not asking for code or a working solution. I'm more interested in the general ideas or possible toolboxes to use.
The easiest way to do this is to extract features from one sphere, as x1, x2, ..., xN and from the other as y1, y2, ..., yN and to estimate the rotation matrix using the Procrustes trick by concatenating these 3D direction vectors into the A and B matrices.
In fact, I believe this is possible with as few as two feature points, (so in the example above, use the two local maxima denoted in red). If the first solution gives an orthogonal matrix with negative determinant, try switching the labels on the pair so that this scheme returns a proper rotation matrix.
From the rotation matrix, it's straightforward to extract the Euler angles (but consider using the Rodrigues parameterization).
Alternatively, go down the rabbit hole on spherical harmonics and convolution.

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

Understanding the output of solvepnp?

I am have been using solvepnp() for the calculation of the rotation and translation matrix. But the euler angles calculated from the obtained rotation matrix gave very erratic values. Trying to find the problem, I had a set of 2D projection points for my marker and kept the other parameters of solvepnp() constant.
Eg values:
2D points
[219.67473, 242.78395; 363.4151, 238.61298; 503.04855, 234.56117; 501.70917, 628.16742; 500.58069, 959.78564; 383.1756, 972.02679; 262.8746, 984.56982; 243.17044, 646.22925]
The euler angle theta(x) calculated from the output rotation matrix of solvepnp() was -26.4877
Next, I incremented only the x value of the first point(i.e 219.67473) by 0.1 to check the variation of the theta(x) euler angle (keeping the remaining points and the other parameters constant) and ran the solvepnp() again .For that very small change,I had values which were decreasing from -19 degree, -18 degree (for x coord = 223.074) then suddenly jump to 27 degree for a while (for x coord = 223.174 to 226.974) then come down to 1.3 degree (for x coord = 227.074).
I cannot understand this behaviour at all.Could somebody please explain?
My euler angle calculation from the rotation matrix uses this procedure.
Try Rodrigues() for conversion between rotation matrix and rotation vector to make sure everything is clean and right. Non RANSAC version can be very sensitive to outliers that create a huge error in the parameters and thus bias a solution. Using RANSAC version of solvePnP may make it more stable to outliers. For example, adding too much to one of the points coordinates will eventually make it an outlier and it won’t influence a solution after that.
If everything fails, write a series unit tests: create an artificial set of points in 3D (possibly non planar), apply a simple translation first, in second variant apply rotation only, and in a third test apply both. Project using your camera matrix and then plug in your 2D, 3D points and projection matrix into your code to find the pose. If the result deviates from the inverse of the translations and rotations your applied to the points look for the bug in feeding parameters to PnP.
It seems the coordinate systems are different.OpenCV uses right-hand coordinate-system Y-pointing downwards. At nghiaho.com it says the calculations are based on this and if you look at the axis they don't seem to match. I guess you are using Rodrigues for matrix computation? Try comparing rotation vectors as well.

Is there a reverse function of lookat for glMatrix?

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.

plane at infinity in projective space from intrinsic parameters

Suppose cameras are calibrated therefore Metric projection matrices M_i(3x4) are there for view i from multiple views. As well, K_i(3x3) the camera matrix of each view is available. Can we calculate the location of plane at infinity in projective space?
Sure, the plane at infinity is always the plane where w = 0. If you are applying affine transformations, it remains fixed. It only shifts if you use a homography.
Yes, it is theoretically possible. The plane at infinity always remains fixed in terms of the actual projective 3D world. However, it is imaged differently by a moving camera upon each view and in these cases we say that the plane at infinity is not at its canonical position. Instead of thinking that the camera moved, it is more convenient to think that the entire 3D projective space has moved! Thus, we invent a 3D homography to "blame" for this change. Mathematically, the 3D homography tags along the left side of the projection matrix:
x = (P*H)*X
So, to answer the question: Although tricky, yes you may recover it, given that you have sufficiently enough reconstructed views of the scene. This is a process called auto calibration and it essentially involves one equation (that comes in many flavors) but unfortunately, gives non-linear equations. I suggest you look at the following:
http://nguyendangbinh.org/Proceedings/CVPR/1999/DATA/03_34.PDF
I believe it contains the most up-to-date method to iteratively compute the plane at infinity.

Resources