Why roll pitch and yaw are applied in that order? - opencv

I know that the common way for applying 3d rotation is Roll, then Pitch, then Yaw.
And I know that the order doesn't really matter as long as you chose an order and stick to it.
But it seems reversed of the intuitive way,
If I would've asked you to explain where your face is facing or at which direction you point a camera at, you would probably start with Yaw (the side you look at), then Pitch (how high\low you look) and then Roll (how to rotate it).
Any explanation why it is the common order used?

First, I would like to state that there is no common way of doing rotation. There are just too many ways and each serves its purpose. I would even argue that Euler angles are a bad choice for the majority of cases.
Anyway, your question is probably related to the meaning of applying a rotation. This can be done in several ways. E.g., we could use a rotation matrix and compose it of the individual principal rotations. If we have column vectors, this would be:
R = Yaw * Pitch * Roll
We can interpret this from left to right and we will see that this is exactly what you would interpret it (start to look left/right, then up/down, finally tilt your head). The important thing is that this will also transform the coordinate system, in which we apply subsequent rotations.
If we use this matrix R to transform a point P, then we would calculate P' = R * P. This is:
P' = Yaw * Pitch * Roll * P
We could calculate this also from right to left by introducing parentheses:
P' = Yaw * Pitch * (Roll * P)
So, we could start by applying Roll to P: P_Roll = Roll * P
P' = Yaw * (Pitch * P_Roll)
, then pitch and finally yaw. In this interpretation, however, we would always use the global coordinate system.
So it is all just a matter of perspective.

Related

Is there a way to shift the principal point of a Scene Kit camera?

I'd like to simulate the shift of a tilt-shift/perspective-control lens in Scene Kit on MacOS.
Imagine the user has the camera facing a tall building at ground level, I'd like to be able to shift the 'lens' so that the projective distortion shifts (see e.g. Wikipedia).
Apple provides lots of physically-based parameters for SCNCamera (sensor height, aperture blade count), but I can't see anything obvious for this. It seems to exist in Unity.
Crucially I'd like to shift the lens so that the object stays in the same position relative to the camera. Obviously I could move the camera to get the effect, but the object needs to stay centred in the viewport (and I can't see a way to modify the viewport either). I've tried to modify the .projectionTransform matrix directly, but it was unsuccessful.
Thanks!
There's is no API on SCNCamera that does that out of the box. As you guessed one has to create a custom projection matrix and set it to the projectionTransform property.
I finally worked out the correct adjustment to the projection matrix – it's quite confusing to follow the maths, because it is a 4x4 matrix rather than 3x4 or 4x3 as you'd use for a plain camera projection matrix, which additionally makes it especially confusing to work out whether it is expecting row vectors or column vectors.
Anyway, the correct element is .m32 for the y axis
let camera = SCNNode()
camera.camera = SCNCamera()
let yShift: CGFloat = 1.0
camera.camera!.projectionTransform.m32 = yShift
Presumably .m31 will shift in the x axis, but I have to admit I haven't tested this.
When I thought about it a bit more, I also realised that the effect I actually wanted involves moving the camera too. Adjusting .m32 simulates moving the sensor, which will appear to move the subject relative to the camera, as if you had a wide angle lens and you were moving the crop. To keep the subject centred in frame, you need to move the camera's position too.
With a bit (a lot) of help from this blog post and in particular this code, I implemented this too:
let distance: CGFloat = 1.0 // calculate distance from subject here
let fovRadians = camera.camera!.fieldOfView * CGFloat.pi / 180.0
let yAdjust = tan(fovRadians / 2) * distance * yShift
camera.position = camera.position - camera.worldUp * yAdjust
(any interested readers could presumably work out the x axis shift from the source above)

Opencv get R and t from Essential Matrix

I'm new to opencv and computer vision. I want to find the R and t matrix between two camera pose. So I generally follows the wikipedia:
https://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E
I find a group of the related pixel location of the same point in the two images. I get the essential matrix. Then I run the SVD. And print the 2 possible R and 2 possible t.
[What runs as expected]
If I only change the rotation alone (one of roll, pitch, yaw) or translation alone (one of x, y, z), it works perfect. For example, if I increase pitch to 15 degree, then I would get R and find the delta pitch is +14.9 degree. If I only increase x for 10 cm, then the t matrix is like [0.96, -0.2, -0.2].
[What goes wrong]
However, if I change both rotation and translation, the R and t is non-sense. For example, if I increase x for 10 cm and increase pitch to 15 degree, then the delta degree is like [-23, 8,0.5], and the t matrix is like [0.7, 0.5, 0.5].
[Question]
I'm wondering why I could not get a good result if I change the rotation and translation at the same time. And it is also confusing why the unrelated rotation or translation (roll, yaw, y, z) also changes so much.
Would anyone be willing to figure me out? Thanks.
[Solved and the reason]
OpenCV use a right-hand coordinate system. This is to say that the z-axis is projected from xy plane to the viewer direction. And our system is using a left-hand coordinate system. So as long as the changes are related to the z-axis, the result is non-sense.
This is solved due to the difference of the coordinates using.
OpenCV use a right-hand coordinate system. This is to say that the z-axis is projected from xy plane to the viewer direction. And our system is using a left-hand coordinate system. So as long as the changes are related to the z-axis, the result is non-sense.

Right hand camera to left hand, OpenCV to Unity

I have a problem that has been puzling me for the last few days. I have a camera pose obtained with Opencv that is right handed (X-right, Y-up, Z-back) and I would like to visualize in Unity (X-right, Y-up, Z-forward) but I cannot really manage to get it right.
I tried to do it using either quaternions or matrices and it should be just mirroring the Z axis and set the rotation of the transform of the camera in Unity to the computed transformation however I cannot get the right conversion.
With quaternions I tried to mirror by negating the Z and W term and I achieved a coordinate system (X-right, Y-down, Z-forward), it makes sense but it is not what I want to achieve. With matrices I think I should multiply my right hand camera by an identity matrix with the element [2,2] set to -1, however I don't get what I want.
I am definitely missing something, probably something really stupid I forgot :)
Has anybody a suggestion?
A Quaternion can be thought of as a rotation around an axis a = (ax, ay, az) by an angle theta
qx = ax * sin(theta/2)
qy = ay * sin(theta/2)
qz = az * sin(theta/2)
qw = cos(theta/2)
In a right-handed coordinate system a rotation of theta will be counter-clockwise, while in a left-handed coordinate system a rotation of theta will be clockwise (depending on your point-of-view, of course).
So to get your quaternion from a right-handed system to Unity's Left-Handed system you have to account for two factors:
The Z-Axis is negated
The direction of rotation is flipped from CCW to CW
We first factor is accounted for by negating the qz component of the quaternion. The second factor is accounted for by flipping the axis of rotation (rotating by 90 degrees around 1,0,0 is the inverse of rotating 90 degrees around -1,0,0).
If your original right-handed quaternion is q and your left-handed quaternion is q'That means you end up with:
q'=(-qx, -qy, qz, qw)
Additional Note
Quaternions don't inherently have a handedness. A quaterion q applies equally well in RH or LH coordinate systems. However, when you apply the quaternion to a spatial vector, the resulting transformation takes on the handedness of the vector's space.

Comparison: TYPE_ROTATION_VECTOR with Complemenary Filter

I have been working on Orientation estimation and I need to estimate correct heading when I am walking in straight line. After facing some roadblocks, I started from basics again.
I have implemented a Complementary Filter from here, which uses Gravity vector obtained from Android (not Raw Acceleration), Raw Gyro data and Raw Magnetometer data. I am also applying a low pass filter on Gyro and Magnetometer data and use it as input.
The output of the Complementary filter is Euler angles and I am also recording TYPE_ROTATION_VECTOR which outputs device orientation in terms of 4D Quaternion.
So I thought to convert the Quaternions to Euler and compare them with the Euler obtained from Complementary filter. The output of Euler angles is shown below when the phone is kept stationary on a table.
As it can be seen, the values of Yaw are off by a huge margin.
What am I doing wrong for this simple case when the phone is stationary
Then I walked in my living room and I get the following output.
The shape of Complementary filter looks very good and it is very close to that of Android. But the values are off by huge margin.
Please tell me what am I doing wrong?
I don't see any need to apply a low-pass filter to the Gyro. Since you're integrating the gyro to get rotation, it could mess everything up.
Be aware that TYPE_GRAVITY is a composite sensor reading synthesized from gyro and accel inside Android's own sensor fusion algorithm. Which is to say that this has already been passed through a Kalman filter. If you're going to use Android's built-in sensor fusion anyway, why not just use TYPE_ROTATION_VECTOR?
Your angles are in radians by the looks of it, and the error in the first set wasn't too far from 90 degrees. Perhaps you've swapped X and Y in your magnetometer inputs?
Here's the approach I would take: first write a test that takes accel and gyro and synthesizes Euler angles from it. Ignore gyro for now. Walk around the house and confirm that it does the right thing, but is jittery.
Next, slap an aggressive low-pass filter on your algorithm, e.g.
yaw0 = yaw;
yaw = computeFromAccelMag(); // yaw in radians
factor = 0.2; // between 0 and 1; experiment
yaw = yaw * factor + yaw0 * (1-factor);
Confirm that this still works. It should be much less jittery but also sluggish.
Finally, add gyro and make a complementary filter out of it.
dt = time_since_last_gyro_update;
yaw += gyroData[2] * dt; // test: might need to subtract instead of add
yaw0 = yaw;
yaw = computeFromAccelMag(); // yaw in radians
factor = 0.2; // between 0 and 1; experiment
yaw = yaw * factor + yaw0 * (1-factor);
They key thing is to test every step of the way as you develop your algorithm, so that when the mistake happens, you'll know what caused it.

XNA 4.0 Camera Question

I'm having trouble understanding how the camera works in my test application. I've been able to piece together a working camera - now I am trying to make sure I understand how it all works. My camera is encapsulated in its own class. Here is the update method that gets called from my Game.Update() method:
public void Update(float dt)
{
Yaw += (200 - Game.MouseState.X) * dt * .12f;
Pitch += (200 - Game.MouseState.Y) * dt * .12f;
Mouse.SetPosition(200, 200);
_worldMatrix = Matrix.CreateFromAxisAngle(Vector3.Right, Pitch) * Matrix.CreateFromAxisAngle(Vector3.Up, Yaw);
float distance = _speed * dt;
if (_game.KeyboardState.IsKeyDown(Keys.E))
MoveForward(distance);
if (_game.KeyboardState.IsKeyDown(Keys.D))
MoveForward(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.S))
MoveRight(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.F))
MoveRight(distance);
if (_game.KeyboardState.IsKeyDown(Keys.A))
MoveUp(distance);
if (_game.KeyboardState.IsKeyDown(Keys.Z))
MoveUp(-distance);
_worldMatrix *= Matrix.CreateTranslation(_position);
_viewMatrix = Matrix.Invert(_worldMatrix); // What's gong on here???
}
First of all, I understand everything in this method other than the very last part where the matrices are being manipulated. I think the terminology is getting in my way as well. For example, my _worldMatrix is really a Rotation Matrix. What really baffles me is the part where the _viewMatrix is calculated by inverting the _worldMatrix. I just don't understand what this is all about.
In prior testing, I always used Matrix.CreateLookAt() to create a view matrix, so I'm a bit confused. I'm hoping someone can explain in simple terms what is going on.
Thanks,
-Scott
One operation the view matrix does for the graphics pipeline is that it converts a 3d point from world space (the x, y, z, we all know & love) into view (or camera) space, a space where the camera is considered to be the center of the world (0,0,0) and all points/objects are relative to it. So while a point may be at 1,1,1 relative to the world, what are it's cordinates relative to the camera location? Well, as it turns out, to find out, you can transform that point by the inverse of a matrix representing the camera's world space position/rotation.
It kinda makes sense if you think about it... let's say the camera position is 2,2,2. An arbitrary point is at 3,3,3. We know that the point is 1,1,1 away from the camera, right? so what transformation would you apply to the point 3,3,3 in order for it to become 1,1,1 (it's location relative to the camera)? you would transform 3,3,3 by -2,-2,-2 to result in 1,1,1. -2,-2,-2 is also the camera's inverted position. That example was for translation because it is relatively easy to groc but basically the same happens for rotation. But don't expect to be able to simply negate all basis vectors to invert a matrix... there is a little more going on with that for rotation.
The Matrix.CreateLookAt() method automatically returns the inverted matrix so you don't really notice it happening unless you reflect its code.
Taking that one step further, the Projection matrix then takes that point in view space and projects it onto a flat surface and that point that started out in 3d space is now in 2d space.

Resources