i am having alittle problem with my rotation. My model rotates but then goes slightly off the camera angle position not in the center. Here is my rotation code, is this correct?
matrix m = matrix.createrotationZ(MathHelper.ToRadian(30))
matrix object = matrix.createtranslation(objectPos) * m;
As Lucius notes in his comment, a matrix represents a sequence of linear transformations. The order in which you multiply them together is important.
RotationMatrix * TranslationMatrix means, "Rotate this model around its axis in model space, then translate the rotated model into world space."
TranslationMatrix * RotationMatrix means, "Translate this model into world space, then rotate the model around the world's origin."
Change the order of your matrix multiplications.
As an aside, multiplying matrices using the * operator can quickly become expensive when you start adding more transformations to the sequence; it can end up creating a lot of extra matrices that are never actually used. You can optimize this by using the static Multiply() method on the Matrix class.
Related
I am not getting any proper source to understand why we need to change rotational vector to rotational matrix [in the context of calculating angle between two ARUco markers].
We are using
rmat = cv2.Rodrigues(rvec)
rmat1 =cv2.Rodrigues(rvec1)
relative_rmat = rmat1#rmat.T
My questions are
why are we converting the rotational vector to rotational matrix
And can I please get the source of relative_rmat's formula. I am tryna understand the geometrical concept
I have tried to understand from Wikipedia. But I am getting more confused. It would be helpful if anyone can provide the source of the concept for both the questions
Translation of a rigid body (or tvec as used in OpenCV conventions) lives in 3D Euclidean space. We call this the 'configuration space'.
Assume there is a rigid body at a point we call pos1. A 3x1 vector pos1 = [x1, y1, z1] completely defines its position uniquely. The term 'unique' means that there is no other way to define pos1 without [x1, y1, z1], and also if you go to [x1, y1, z1], it will always be pos1.
Any attitude (a rotation of a rigid body around three-dimensional space) can be represented in many different ways. However, attitudes live in a place called 'Special Orthogonal Space or SO(3)'. This is the configuration space for rotations, and elements in this space are what you were referring to as 'rotation matrices'.
All other ways of defining a rotation like Euler angles, rotation vectors (rvec in OpenCV), or quaternions are 'local parameterizations' of a given rotation matrix. So they have several issue, including not being unique at some rotations. Gimbal Lock wiki page has some nice visualizations.
To avoid such issues, the simplest way is to use rotation matrices. Even though it seems complex, rotation matrices can be so much easier to work with once you get used to it. Given the properties of SO(3), the inverse of a rotation matrix is the transpose of that matrix (which is why you get the rmat.T when you are trying to get the relative rotation).
Let's assume that you have two markers named 'marker' and 'marker1' that corresponds to rvec and rvec1 respectively. Your rmat is the rotation of 'marker' with respect to the camera frame, or how a vector in 'marker' can be represented in the camera frame (I know this can be confusing, but this is how this is defined, so stay with me).
Similarly, rmat1 is how a vector in 'marker1' is represented in the camera frame. Also, keep in mind that these matrices are directional, meaning we need to know inverse(rmat1) if we need to find the how a vector in camera frame is represented in the 'marker1' frame.
Your relative_rmat is how you represent a vector in 'marker' in 'marker1'. You cannot randomly hop on to different markers, and always need to go through a common place. First you have to transform a vector in 'marker' to camera frame, and then transform that to the 'marker1' frame. We can write that as
relative_rmat = rmat1 # inverse(rmat)
But as I said above, a special property of a rotation matrix dictates that its inverse is as same as its transpose. So we write it as:
relative_rmat = rmat1 # rmat.T
The order matter here and you should always start with the first rotation and pre-multiply subsequent rotations. If you want to go the other way, you just need to take the inverse of the relative_rmat. So with simple matrix math, we can see that it as same as if we follow the rotations as I described above manually.
relative_rmat_inverse
= relative_rmat.T
= (rmat1 # rmat.T).T
= (rmat.T).T # rmat1.T
= rmat # rmat1.T
It is hard to detail everything here, and it can take a while to understand the math behind the rotation matrices. I would recommend this as a good reference, but it can get very technical depending on your background. If you are new to this, start with basics of robotics, coordinate transformations, and then move to SO(3).
I use DirectX Toolkit to display a 3d model, following the 'Rendering the model' and my pyramid is displayed:
When trying to transform the object, the scaling and rotation work well but I'm not sure how to move the object (translate) around. Basically I'm looking for an algorithm that determines, given the current camera position, focus, viewport and the rendered model (which the DirectX toolkit gives me the bounding box so it's "size") the minimum and maximum values for XYZ translation so the object remains visible.
The bounding box is always the same, no matter the view port size, so how do I compare it's size against my viewport?
Please excuse my newbiness, I'm not a 3D developer, at least yet.
The "Simple Rendering" example which draws a triangle:
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
m_effect->SetProjection(proj);
says that the normalized triangle size is [1,1,1] but here normalized values do not work.
TL:DR: To move your model around the world, create a matrix for the translation and set it as the world matrix with SetWorld.
Matrix world = Matrix::CreateTranslation( 2.f, 1.f, 3.f);
m_effect->SetWorld(world);
// Also be sure you have called SetView and SetProjection for the 3D camera setup
//covered in the 3D shapes / Rendering a model tutorial
You should start with a basic review of 3D transformations, in particular the world -> view -> projection transformation pipeline.
The world transformation performs the affine transformation to get the model you are rendering into it's 'world' position. (a.k.a. 'local coordinates to world coordinates transformation').
The view transformation performs the transformation to get world positions into the camera's point of view (i.e. position and direction) (a.k.a. 'world coordinates to view coordinates transformation').
The projection transformation performs the transformation to get the view positions into the canonical "-1 to 1" range that the actual hardware uses, including any perspective projection (a.k.a. 'view coordinates to 'clip' coordinates transformation).
The hardware itself performs the final step of converting the "-1 to 1" to pixel locations in the render target based on the Direct3D SetViewport information (a.k.a. 'clip' coordinates to pixel coordinates transformation).
This Direct3D 9 era article is a bit dated, but it covers the overall idea well.
In the DirectX Tool Kit BasicEffect system, there are distinct methods for each of these matrices: SetWorld, SetView, and SetProjection. There is also a helper if you want to set all three at once SetMatrices.
The simple rendering tutorial is concerned with the simplest form of rendering, 2D rendering, where you want the coordinates you provide to be in natural 'pixel coordinates'
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
m_effect->SetProjection(proj);
The purpose of this matrix is to basically 'undo' what the SetViewport will do so that you can think in simple pixel coordinates. It's not suitable for 3D models.
In the 3D shapes tutorial I cover the basic camera model, but I leave the world matrix as the identity so the shape is sitting at the world origin.
m_view = Matrix::CreateLookAt(Vector3(2.f, 2.f, 2.f),
Vector3::Zero, Vector3::UnitY);
m_proj = Matrix::CreatePerspectiveFieldOfView(XM_PI / 4.f,
float(backBufferWidth) / float(backBufferHeight), 0.1f, 10.f);
In the Rendering a model tutorial, I also leave the world matrix as identity. I get into the basics of this in Basic game math tutorial.
One of the nice properties of affine transformations is that you can perform them all at once by transforming by the concatenation of the individual transforms. Point p transformed by matrix W, then transformed by matrix V, then transformed by matrix P is the same as point p transformed by matrix W * V * P.
I have a rotation-translation matrix [R T] (3x4).
Is there a function in opencv that performs the rotation-translation described by [R T]?
A lot of solutions to this question I think make hidden assumptions. I will try to give you a quick summary of how I think about this problem (I have had to think about it a lot in the past). Warping between two images is a 2 dimensional process accomplished by a 3x3 matrix called a homography. What you have is a 3x4 matrix which defines a transform in 3 dimensions. You can convert between the two by treating your image as a flat plane in 3 dimensional space. The trick then is to decide on the initial position in world space of your image plane. You can then transform its position and project it onto a new image plane with your camera intrinsics matrix.
The first step is to decide where your initial image lies in world space, note that this does not have to be the same as your initial R and T matrices specify. Those are in world coordinates, we are talking about the image created by that world, all the objects in the image have been flattened into a plane. The simplest decision here is to set the image at a fixed displacement on the z axis and no rotation. From this point on I will assume no rotation. If you would like to see the general case I can provide it but it is slightly more complicated.
Next you define the transform between your two images in 3d space. Since you have both transforms with respect to the same origin, the transform from [A] to [B] is the same as the transform from [A] to your origin, followed by the transform from the origin to [B]. You can get that by
transform = [B]*inverse([A])
Now conceptually what you need to do is to take your first image, project its pixels onto the geometric interpretation of your image in 3d space, then transform those pixels in 3d space by the transform above, then project them back onto a new 2d image with your camera matrix. Those steps need to be combined into a single 3x3 matrix.
cv::Matx33f convert_3x4_to_3x3(cv::Matx34f pose, cv::Matx33f camera_mat, float zpos)
{
//converted condenses the 3x4 matrix which transforms a point in world space
//to a 3x3 matrix which transforms a point in world space. Instead of
//multiplying pose by a 4x1 3d homogeneous vector, by specifying that the
//incoming 3d vectors will ALWAYS have a z coordinate of zpos, one can instead
//multiply converted by a homogeneous 2d vector and get the same output for x and y.
cv::Matx33f converted(pose(0,0),pose(0,1),pose(0,2)*zpos+pose(0,3),
pose(1,0),pose(1,1),pose(1,2)*zpos+pose(1,3),
pose(2,0),pose(2,1),pose(2,2)*zpos+pose(2,3));
//This matrix will take a homogeneous 2d coordinate and "projects" it onto a
//flat plane at zpos. The x and y components of the incoming homogeneous 2d
//coordinate will be correct, the z component is dropped.
cv::Matx33f projected(1,0,0,
0,1,0,
0,0,zpos);
projected = projected*camera_mat.inv();
//now we have the pieces. A matrix which can take an incoming 2d point, and
//convert it into a pseudo 3d point (x and y correspond to 3d, z is unused)
//and a matrix which can take our pseudo 3d point and transform it correctly.
//Now we just need to turn our transformed pseudo 3d point back into a 2d point
//in our new image, to do that simply multiply by the camera matrix.
return camera_mat*converted*projected;
}
This is probably a more complicated answer than you were looking for but I hope it gives you an idea of what you are asking. This can be very confusing and I glazed over some parts of it quickly, feel free to ask for clarification. If you need the solution to work without the assumption that the initial image appears without rotation let me know, I just didn't want to make it more complicated than it needed to be.
I need to find the pose (rotation matrix + translation vector) for a camera, and for that I am using cv2.solvePnP(), but the results I get from photos don't match.
In order to debug, I created (with numpy) a "debugging 3d scene" composed of some object points (four corners of a square), some camera points (focal point, principal point and four corners of the virtual projection plane) and parameters (focal distance, initial orientation).
Then, I construct a general rotation matrix by multiplying three axis rotation matrices, apply this general rotation to the camera (numpy.dot()), project the object points in the virtual projection plane (line-plane intersection algorithm), and calculate the in-plane 2D coordinates (point-line distance) to the projection plane axes.
After doing this (objectpoints to imagepoints via rotationmatrix), I feed the imagepoints and the objectpoints to cv2.Rodrigues(cv2.solvePnP(...)) and get a matrix "not quite identical" to the one I used, only because of transposition and some elements with opposite signal (negative vs. positive), respecting this relation:
solvepnp_rotmatrix = my_original_matrix.transpose * [ 1 1 1]
[ 1 1 -1]
[-1 -1 1]
Although the rotation matrix mismatch is "solveable" with this hack, the TRANSLATION VECTOR gives coordinates that don't make sense to me.
I suspect there are mismatches between my 3D model (handedness, axes orientation, order of rotations) and the model used by opencv:
I use OpenGL-like coordinate system (X increases to the right, Y increases upwards, and Z increases toward the observer;
I applied the rotations in the order that made more sense to me (all right-handed, first around global Z, then around global X, then around global Y);
The image plane is between object and camera focal point (virtual projection plane, instead of real/CCD);
The origin of my image plane (virtual CCD) is lower-left corner (Xpix increases to the right, Ypx increases upwards.
My questions are:
Given that the terms of the rotation matrix are identical, only transposed and with different signal in some terms, is it possible that I am confusing some of openCV conventions (handedness, order of rotations, axis direction)? And how can I discover which one(s)?
Also, is there a way to relate my handmade translation vector to the tvec returned by solvePnP? (of course, ideally, the best would be to make the coordinate systems to match, in the first place).
Any help will be most welcome!
I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.