Why OpenGL transformation order in reversed compare to what is write in code? - glm-math

In the previous question, I'm confused with the order of multiply to build a final transform matrix, but I do not describe the question clearly. So I create a new question here.
The author says:
Here we first rotate the container around the origin (0,0,0) and once
it's rotated, we translate its rotated version to the bottom-right
corner of the screen. Remember that the actual transformation order
should be read in reverse: even though in code we first translate and
then later rotate, the actual transformations first apply a rotation
and then a translation.
So, why the actual transformation order is applied in reversed order ?
I have searched on the web, and find this is mentioned in some place, like page 16 in this lecture03 slide, and page 12 on this lecture , but they all not describe the reason behind it

Generally these operations are defined by right-multiplying the given transform matrix on the modelview matrix (e.g. MV * T, where MV is the modelview matrix and T is the transform in question).
At the end of the transform chain, geometry vectors are implicitly right multiplied on the accumulated set of matrix transforms (e.g. MV * T1 * T2 * T3 * x, where T1, T2, and T3 are transformations (usually T1 = translate, T2 = rotate, T3 = scale), and x is any geometry vector). Thus, the last transform applied actually touches the geometry vector first (e.g. T3 * x). It's effectively grouped as (MV * (T1 * (T2 * (T3 * x)))), but of course it's equivalent to ((((MV * T1) * T2) * T3) * x) because matrix multiplication is associative.

Just like in linear algebra, operation order is reversed.

Related

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Moving multiple sprites in elliptical path with uniform speed

I'm trying to move multiple sprites (images) in an elliptical path such that distance (arc distance) remains uniform.
I have tried
Move each sprite angle by angle, however the problem with this is that distance moved while moving unit angle around major axis is different than that while moving unit angle around minor axis - hence different distance moved.
Move sprites with just changing x-axis uniformly, however it again moves more around major axis.
So any ideas how to move sprites uniformly without them catching-up/overlapping each other?
Other info:
it will be called in onMouseMove/onTouchMoved so i guess it shouldn't
be much CPU intensive.
Although its a general algorithm question but
if it helps I'm using cocos2d-x
So this is what i ended up doing (which solved it for me):
I moved it in equation of circle and increased angle by 1 degree. Calculated x and y using sin/cos(angle) * radius. And to make it into an ellipse I multiplied it by a factor.
Factor was yIntercept/xIntercept.
so it looked like this in end
FACTOR = Y_INTERCEPT / X_INTERCEPT;
//calculate previous angle
angle = atan((prev_y/FACTOR)/prev_x);
//increase angle by 1 degree (make sure its not radians in your case)
angle++;
//new x and y
x = cos(newangle) * X_INTERCEPT;
y = sin(newangle) * X_INTERCEPT * FACTOR;
I have written a function named getPointOnEllipse that allows you to move your sprites pixel-by-pixel in an elliptical path. The function determines the coordinates of a particular point in the elliptical path, given the coordinates of the center of the ellipse, the lengths of the semi-major axis and the semi-minor axis, and finally the offset of the point into the elliptical path, all in pixels.
Note: To be honest, unfortunately, the getPointOnEllipse function skips (does not detect) a few of the points in the elliptical path. As a result, the arc distance is not exactly uniform. Sometimes it is one pixel, and sometimes two pixels, but not three or more! In spite of the fault, changes in speed will be really "faint", and IMO, your sprites will move pretty smoothly.
Below is the getPointOnEllipse function, along with another function named getEllipsePerimeter, which is used to determine an ellipse's perimeter through Euler's formula. The code is written in JScript.
function getEllipsePerimeter(rx, ry)
{
with (Math)
{
// You'll need to floor the return value to obtain the ellipse perimeter in pixels.
return PI * sqrt(2 * (rx * rx + ry * ry));
}
}
function getPointOnEllipse(cx, cy, rx, ry, d)
{
with (Math)
{
// Note: theta expresses an angle in radians!
var theta = d * sqrt(2 / (rx * rx + ry * ry));
//var theta = 2 * PI * d / getEllipsePerimeter(rx, ry);
return {x:floor(cx + cos(theta) * rx),
y:floor(cy - sin(theta) * ry)};
}
}
The following figure illustrates the parameters of this function:
cx - the x-coordinate of the center of the ellipse
cy - the y-coordinate of the center of the ellipse
rx - the length of semi-major axis
ry - the length of semi-minor axis
d - the offset of the point into the elliptical path (i.e. the arc length from the vertex to the point)
The unit of all parameters is pixel.
The function returns an object containing the x- and y-coordinate of the point of interest, which is represented by a purple ball in the figure.
d is the most important parameter of the getPointOnEllipse function. You should call this function multiple times. In the first call, set d to 0, and then place the sprite at the point returned, which causes the sprite to be positioned on the vertex. Then wait a short period (e.g. 50 milliseconds), and call the function again, setting d parameter to 1. This time, by placing the sprite at the point returned, it moves 1 pixel forward in the ellipse path. Then repeat doing so (wait a short period, call the function with increased d value, and position the sprite) until the value of d reaches the perimeter of the ellipse. You can also increase d value by more than one, so that the sprite moves more pixels forward in each step, resulting in faster movement.
Moreover, you can modify the getEllipsePerimeter function in the code to use a more precise formula (like Ramanujan's formula) for getting ellipse perimeter. But in that case, be sure to modify the getPointOnEllipse function as well to use the second version of theta variable (which is commented in the code). Note that the first version of theta is just a simplified form of the second version for the sake of optimization.

How to transform an image based on the position of camera

I'm trying to create a perspective projection of an image based on the look direction. I'm unexperienced on this field and can't manage to do that myself, however. Will you help me, please?
There is an image and an observer (camera). If camera can be considered an object on an invisible sphere and the image a plane going through the middle of the sphere, then camera position can be expressed as:
x = d cos(θ) cos(φ)
y = d sin(θ)
z = d sin(φ) cos(θ)
Where θ is latitude, φ is longitude and d is the distance (radius) from the middle of the sphere where the middle of the image is.
I found these formulae somwhere, but I'm not sure about the coordinates (I don't know but it looks to me that x should be z but I guess it depends on the coordinate system).
Now, what I need to do is make a proper transformation of my image so it looks as if viewed from the camera (in a proper perspective). Would you be so kind to tell me a few words how this could be done? What steps should I take?
I'm developing an iOS app and I thought I could use the following method from the QuartzCore. But I have no idea what angle I should pass to this method and how to derive the new x, y, z coordinates from the camera position.
CATransform3D CATransform3DRotate (CATransform3D t, CGFloat angle,
CGFloat x, CGFloat y, CGFloat z)
So far I have successfully created a simple viewing perspective by:
using an identity matrix (as the CATransform3D parameter) with .m34 set to 1/-1000,
rotating my image by the angle of φ with the (0, 1, 0) vector,
concatenating the result with a rotation by θ and the (1, 0, 0) vector,
scaling based on the d is ignored (I scale the image based on some other criteria).
But the result I got was not what I wanted (which was obvious) :-/. The perspective looks realistic as long as one of these two angles is close to 0. Therefore I thought there could be a way to calculate somehow a proper angle and the x, y and z coordinates to achieve a proper transformation (which might be wrong because it's just my guess).
I think I managed to find a solution, but unfortunately based on my own calculations, thoughts and experiments, so I have no idea if it is correct. Seems to be OK, but you know...
So if the coordinate system is like this:
and the plane of the image to be transformed goes through the X and the Y axis, and its centre is in the origin of the system, then the following coordinates:
x = d sin(φ) cos(θ)
y = d sin(θ)
z = d cos(θ) cos(φ)
define a vector that starts in the origin of the coordinate system and points to the position of the camera that is observing the image. The d can be set to 1 so we get a unit vector at once without further normalization. Theta is the angle in the ZY plane and phi is the angle in the ZX plane. Theta raises from 0° to 90° from the Z+ to the Y+ axis, whereas phi raises from 0° to 90° from the Z+ to the X+ axis (and to -90° in the opposite direction, in both cases).
Hence the transformation vector is:
x1 = -y / z
y1 = -x / z
z1 = 0.
I'm not sure about z1 = 0, however rotation around the Z axis seemed wrong to me.
The last thing to calculate is the angle by which the image has to be transformed. In my humble opinion this should be the angle between the vector that points to the camera (x, y, z) and the vector normal to the image, which is the Z axis (0, 0, 1).
The dot product of two vectors gives the cosine of the angle between them, so the angle is:
α = arccos(x * 0 + y * 0 + z * 1) = arccos(z).
Therefore the alpha angle and the x1, y1, z1 coordinates are the parameters of CATransform3DRotate method I mentioned in my question.
I would be grateful if somebody could tell me if this approach is correct. Thanks a lot!

trying to understand the Affine Transform

I am playing with the affine transform in OpenCV and I am having trouble getting an intuitive understanding of it workings, and more specifically, just how do I specify the parameters of the map matrix so I can get a specific desired result.
To setup the question, the procedure I am using is 1st to define a warp matrix, then do the transform.
In OpenCV the 2 routines are (I am using an example in the excellent book OpenCV by Bradski & Kaehler):
cvGetAffineTransorm(srcTri, dstTri, warp_matrix);
cvWarpAffine(src, dst, warp_mat);
To define the warp matrix, srcTri and dstTri are defined as:
CvPoint2D32f srcTri[3], dstTri[3];
srcTri[3] is populated as follows:
srcTri[0].x = 0;
srcTri[0].y = 0;
srcTri[1].x = src->width - 1;
srcTri[1].y = 0;
srcTri[2].x = 0;
srcTri[2].y = src->height -1;
This is essentially the top left point, top right point, and bottom left point of the image for starting point of the matrix. This part makes sense to me.
But the values for dstTri[3] just are confusing, at least, when I vary a single point, I do not get the result I expect.
For example, if I then use the following for the dstTri[3]:
dstTri[0].x = 0;
dstTri[0].y = 0;
dstTri[1].x = src->width - 1;
dstTri[1].y = 0;
dstTri[2].x = 0;
dstTri[2].y = 100;
It seems that the only difference between the src and the dst point is that the bottom left point is moved to the right by 100 pixels. Intuitively, I feel that the bottom part of the image should be shifted to the right by 100 pixels, but this is not so.
Also, if I use the exact same values for dstTri[3] that I use for srcTri[3], I would think that the transform would produce the exact same image--but it does not.
Clearly, I do not understand what is going on here. So, what does the mapping from the srcTri[] to the dstTri[] represent?
Here is a mathematical explanation of an affine transform:
this is a matrix of size 3x3 that applies the following transformations on a 2D vector: Scale in X axis, scale Y, rotation, skew, and translation on the X and Y axes.
These are 6 transformations and thus you have six elements in your 3x3 matrix. The bottom row is always [0 0 1].
Why? because the bottom row represents the perspective transformation in axis x and y, and affine transformation does not include perspective transform.
(If you want to apply perspective warping use homography: also 3x3 matrix )
What is the relation between 6 values you insert into affine matrix and the 6 transformations it does? Let us look at this 3x3 matrix like
e*Zx*cos(a), -q1*sin(a) , dx,
e*q2*sin(a), Z y*cos(a), dy,
0 , 0 , 1
The dx and
dy elements are translation in x and y axis (just move the picture left-right, up down).
Zx is the relative scale(zoom) you apply to the image in X axis.
Zy is the same as above for y axis
a is the angle of rotation of the image. This is tricky since when you want to rotate by 'a' you have to insert sin(), cos() in 4 different places in the matrix.
'q' is the skew parameter. It is rarely used. It will cause your image to skew on the side (q1 causes y axis affects x axis and q2 causes x axis affect y axis)
Bonus: 'e' parameter is actually not a transformation. It can have values 1,-1. If it is 1 then nothing happens, but if it is -1 than the image is flipped horizontally. You can use it also to flip the image vertically but, this type of transformation is rarely used.
Very important Note!!!!!
The above explanation is mathematical. It assumes you multiply the matrix by the column vector from the right. As far as I remember, Matlab uses reverse multiplication (row vector from the left) so you will need to transpose this matrix. I am pretty sure that OpenCV uses regular multiplication but you need to check it.
Just enter only translation matrix (x shifted by 10 pixels, y by 1).
1,0,10
0,1,1
0,0,1
If you see a normal shift than everything is OK, but If shit appears than transpose the matrix to:
1,0,0
0,1,0
10,1,1

Matrix mult order in Direct3D

I've received two conflicting answers in terms of multiplying matrices in Direct3D to achieve results. Tutorials do state to multiply from left to right and that's fine but it's not how I would visualize it.
Here's an example:
OpenGL (reading from top to bottom):
GLRotatef(90.0f);
GLTranslatef(20.0f,0,0);
So you visualize the world axis rotating 30 degrees. Then you translate 20.0 on the now rotated x-axis so it looks like you are going up on the world y-axis.
In Direct3D, doing:
wm = rotatem * translatem;
is different. It looks like the object was just rotated at the origin and translated on the world's x-axis so it goes to the right and not up. It only works once I reverse the order and read from right to left.
Also for example, in frank luna's book on DX10, he goes into explaining how to do mirror reflections. I get all of that but when he does for example:
reflection_matrix = world_m * reflection_m;
around the xy plane, do I interpret this as first doing a the world positioning then a reflection or the opposite?
The problem is the order you are multiplying the matrices to get the composite transform matrix is reversed from what it should be. You are doing: wm = rotatem * translatem, which follows the order of operations you are doing for OpenGL, but for DirectX the matrix should have been wm = translatem * rotatem
The fundamental difference between OpenGL and DirectX arises from the fact that OpenGL treats matrices in column major order, while DirectX treats matrics in row major order.
To go from column major to row major you need to find the transpose ( swap the rows and the columns ) of the OpenGL matrix.
So, if you write wm = rotatem * translatem in OpenGL, then you want the transpose of that for DirectX, which is:
wmT = (rotatem*translatem)T = translatemT * rotatemT
which explains why the order of the matrix multiply has to be reversed in DirectX.
See this answer. In OpenGL, each subsequent operation is a pre-multiplication of all the operations before it, not a post-multiplication. You can see a matrix multiplication of a vector as a function evaluation.
If what you want is to first rotate a vector and then translate your rotated vector, which you in OpenGL would have solved by first calling glRotatef and then calling glTranslatef, you could express that using function calls as
myNewVector = translate(rotate(myOldVector))
The rotate function does this
rotate(anyVector) = rotationMatrix * anyVector
and the translate function does this
translate(anyOtherVector) = translationMatrix * anyOtherVector
so your equivalent expression using matrix multiplications would look like
myNewVector = translationMatrix * rotationMatrix * myOldVector
That is, your combined transformation matrix would look be translationMatrix * rotationMatrix.

Resources