I've received two conflicting answers in terms of multiplying matrices in Direct3D to achieve results. Tutorials do state to multiply from left to right and that's fine but it's not how I would visualize it.
Here's an example:
OpenGL (reading from top to bottom):
GLRotatef(90.0f);
GLTranslatef(20.0f,0,0);
So you visualize the world axis rotating 30 degrees. Then you translate 20.0 on the now rotated x-axis so it looks like you are going up on the world y-axis.
In Direct3D, doing:
wm = rotatem * translatem;
is different. It looks like the object was just rotated at the origin and translated on the world's x-axis so it goes to the right and not up. It only works once I reverse the order and read from right to left.
Also for example, in frank luna's book on DX10, he goes into explaining how to do mirror reflections. I get all of that but when he does for example:
reflection_matrix = world_m * reflection_m;
around the xy plane, do I interpret this as first doing a the world positioning then a reflection or the opposite?
The problem is the order you are multiplying the matrices to get the composite transform matrix is reversed from what it should be. You are doing: wm = rotatem * translatem, which follows the order of operations you are doing for OpenGL, but for DirectX the matrix should have been wm = translatem * rotatem
The fundamental difference between OpenGL and DirectX arises from the fact that OpenGL treats matrices in column major order, while DirectX treats matrics in row major order.
To go from column major to row major you need to find the transpose ( swap the rows and the columns ) of the OpenGL matrix.
So, if you write wm = rotatem * translatem in OpenGL, then you want the transpose of that for DirectX, which is:
wmT = (rotatem*translatem)T = translatemT * rotatemT
which explains why the order of the matrix multiply has to be reversed in DirectX.
See this answer. In OpenGL, each subsequent operation is a pre-multiplication of all the operations before it, not a post-multiplication. You can see a matrix multiplication of a vector as a function evaluation.
If what you want is to first rotate a vector and then translate your rotated vector, which you in OpenGL would have solved by first calling glRotatef and then calling glTranslatef, you could express that using function calls as
myNewVector = translate(rotate(myOldVector))
The rotate function does this
rotate(anyVector) = rotationMatrix * anyVector
and the translate function does this
translate(anyOtherVector) = translationMatrix * anyOtherVector
so your equivalent expression using matrix multiplications would look like
myNewVector = translationMatrix * rotationMatrix * myOldVector
That is, your combined transformation matrix would look be translationMatrix * rotationMatrix.
Related
Given an object's 3D mesh file and an image that contains the object, what are some techniques to get the orientation/pose parameters of the 3d object in the image?
I tried searching for some techniques, but most seem to require texture information of the object or at least some additional information. Is there a way to get the pose parameters using just an image and a 3d mesh file (wavefront .obj)?
Here's an example of a 2D image that can be expected.
FOV of camera
Field of view of camera is absolute minimum to know to even start with this (how can you determine how to place object when you have no idea how it would affect scene). Basically you need transform matrix that maps from world GCS (global coordinate system) to Camera/Screen space and back. If you do not have a clue what about I am writing then perhaps you should not try any of this before you learn the math.
For unknown camera you can do some calibration based on markers or etalones (known size and shape) in the view. But much better is use real camera values (like FOV angles in x,y direction, focal length etc ...)
The goal for this is to create function that maps world GCS(x,y,z) into Screen LCS(x,y).
For more info read:
transform matrix anatomy
3D graphic pipeline
Perspective projection
Silhouette matching
In order to compare rendered and real image similarity you need some kind of measure. As you need to match geometry I think silhouette matching is the way (ignoring textures, shadows and stuff).
So first you need to obtain silhouettes. Use image segmentation for that and create ROI mask of your object. For rendered image is this easy as you van render the object with single color without any lighting directly into ROI mask.
So you need to construct function that compute the difference between silhouettes. You can use any kind of measure but I think you should start with non overlapping areas pixel count (it is easy to compute).
Basically you count pixels that are present only in one ROI (region of interest) mask.
estimate position
as you got the mesh then you know its size so place it in the GCS so rendered image has very close bounding box to real image. If you do not have FOV parameters then you need to rescale and translate each rendered image so it matches images bounding box (and as result you obtain only orientation not position of object of coarse). Cameras have perspective so the more far from camera you place your object the smaller it will be.
fit orientation
render few fixed orientations covering all orientations with some step 8^3 orientations. For each compute the difference of silhouette and chose orientation with smallest difference.
Then fit the orientation angles around it to minimize difference. If you do not know how optimization or fitting works see this:
How approximation search works
Beware too small amount of initial orientations can cause false positioves or missed solutions. Too high amount will be slow.
Now that was some basics in a nutshell. As your mesh is not very simple you may need to tweak this like use contours instead of silhouettes and using distance between contours instead of non overlapping pixels count which is really hard to compute ... You should start with simpler meshes like dice , coin etc ... and when grasping all of this move to more complex shapes ...
[Edit1] algebraic approach
If you know some points in the image that coresponds to known 3D points (in your mesh) then you can along with the FOV of the camera used compute the transform matrix placing your object ...
if the transform matrix is M (OpenGL style):
M = xx,yx,zx,ox
xy,yy,zy,oy
xz,yz,zz,oz
0, 0, 0, 1
Then any point from your mesh (x,y,z) is transformed to global world (x',y',z') like this:
(x',y',z') = M * (x,y,z)
The pixel position (x'',y'') is done by camera FOV perspective projection like this:
y''=FOVy*(z'+focus)*y' + ys2;
x''=FOVx*(z'+focus)*x' + xs2;
where camera is at (0,0,-focus), projection plane is at z=0 and viewing direction is +z so for any focal length focus and screen resolution (xs,ys):
xs2=xs*0.5;
ys2=ys*0.5;
FOVx=xs2/focus;
FOVy=ys2/focus;
When put all this together you obtain this:
xi'' = ( xx*xi + yx*yi + zx*zi + ox ) * ( xz*xi + yz*yi + zz*zi + ox + focus ) * FOVx
yi'' = ( xy*xi + yy*yi + zy*zi + oy ) * ( xz*xi + yz*yi + zz*zi + oy + focus ) * FOVy
where (xi,yi,zi) is i-th known point 3D position in mesh local coordinates and (xi'',yi'') is corresponding known 2D pixel positions. So unknowns are the M values:
{ xx,xy,xz,yx,yy,yx,zx,zy,zz,ox,oy,oz }
So we got 2 equations per each known point and 12 unknowns total. So you need to know 6 points. Solve the system of equations and construct your matrix M.
Also you can exploit that M is a uniform orthogonal/orthonormal matrix so vectors
X = (xx,xy,xz)
Y = (yx,yy,yz)
Z = (zx,zy,zz)
Are perpendicular to each other so:
(X.Y) = (Y.Z) = (Z.X) = 0.0
Which can lower the number of needed points by introducing these to your system. Also you can exploit cross product so if you know 2 vectors the thirth can be computed
Z = (X x Y)*scale
So instead of 3 variables you need just single scale (which is 1 for orthonormal matrix). If I assume orthonormal matrix then:
|X| = |Y| = |Z| = 1
so we got 6 additional equations (3 x dot, and 3 for cross) without any additional unknowns so 3 point are indeed enough.
i am having alittle problem with my rotation. My model rotates but then goes slightly off the camera angle position not in the center. Here is my rotation code, is this correct?
matrix m = matrix.createrotationZ(MathHelper.ToRadian(30))
matrix object = matrix.createtranslation(objectPos) * m;
As Lucius notes in his comment, a matrix represents a sequence of linear transformations. The order in which you multiply them together is important.
RotationMatrix * TranslationMatrix means, "Rotate this model around its axis in model space, then translate the rotated model into world space."
TranslationMatrix * RotationMatrix means, "Translate this model into world space, then rotate the model around the world's origin."
Change the order of your matrix multiplications.
As an aside, multiplying matrices using the * operator can quickly become expensive when you start adding more transformations to the sequence; it can end up creating a lot of extra matrices that are never actually used. You can optimize this by using the static Multiply() method on the Matrix class.
I'm tracking a marker with ARToolKit+. I receive a model view matrix that looks about right. Now I'd like to warp the image in a way that the marker looks just like it would look if I looked straight at it. But whatever I do, the result looks just extremely distorted. I know that ARToolKit stores the 4x4 matrix in column major order, so I fixed that for OpenCV.
What I tried so far was:
1) fix the order to row major order
2) calculate the inverse with cvInverse (although transposing the 3x3 rotation part + inverting the translation should suffice)
3) use that matrix with cvPerspectiveWarp
Am I doing something wrong?
tl;dr:
I want this: https://www.youtube.com/watch?v=qZ-LU-C2p2Q
I get some distorted lines and lots of black instead.
Your problem is in converting from 4x4 to 3x3. The short answer is that you want to drop the 3rd column and bottom row to make the 3x3 and then premultiply with your camera matrix. For a longer explanation see here
Clarification
The pose you get from ARTK represents a transform from one place to another. When I say "the initial image appears without rotation" I meant that your transform goes from an initial state which has no rotation about the x or y axis to the current state. That is a fine assumption for most augmented reality applications, I mentioned it just to be thorough.
As for why you can drop the 3rd column. Since you are transforming a plane, your z coordinate can be completely expressed by your x and y coordinates given the equation of your plane. If we assume that initially there is no rotation then your initial z coordinate is a constant value. If there is rotation then z is not constant but it varies deterministically in x and y according to its plane equation which can still be expressed in one matrix (though you don't need that). Since in your case your 4x4 transform is probably expressing the transform from the marker lying flat at z = 0 to its current position, the 3rd column of your 4x4 matrix does nothing (it all gets multiplied by 0) so it can be dropped without affecting the result.
In short: Forget about the rotation stuff, its more complicated than you need, just realize that the transform is from initial coordinates to final coordinates and your initial coordinates are always
[x,y,0,1]
which makes your third column irrelevant.
Update
I'm sorry! I just re-read your question and realized you just want to warp the marker so it looks like a straight on view, I got caught up in describing a general transform from 4x4 to 3x3. The 4x4 transform you get from ARTK is not the transform that will de warp the warker, it is the transform that moves the marker from the origin to its final position. To de warp the marker like you asked the process is similar but would be slightly different. I haven't done that before but here is my guess.
First, you need to get the 4x4 transform between where the marker is in world space, and where you would like it to appear to be after warping it. Right now the transform goes from the origin to the marker location. To change the transform to go from some point farther down on the z axis (say 100) to the marker location define the transform.
initial_marker_pose = [1,0,0,0
0,1,0,0
0,0,1,100
0,0,0,1];
Now you have the transform from the origin to what you want as your "inital" position, and the transform from the origin to your "final" position. To get the transform from initial to final simply
initial_to_final = origin_to_marker*initial_marker_pose.inv();
Now you would follow the process outlined in the link I gave you, in this case your initial zpos is no longer 0, it is 100. Then when you are finished you will need to invert your 3x3 matrix. That is because this process takes you from a straight on view to the one defined by the pose from ARTK and you want the opposite of that. You will need to experiment with the initial z position. The smaller it is, the larger your marker will appear after de-warping.
Hopefully that works, sorry for the confusion about your question.
I'm doing video stabilization using optical flow. To make calcOpticalFlowPyrLK work faster I'm downscaling the original image 2x and running the function on that.
How can I modify the homograph matrix (retrieved via findHomography) to be able to warpPerspective the original, larger, image.
This is a little late and the answer you have works fine but I have one thing to add. I don't like taking functions like getPerspectiveTransform for granted. In this case it is easy to just make the matrix yourself. Image reductions that are powers of 2 are easy. Suppose you have a point and you want to move it to an image with twice the resolution.
float newx = (oldx+.5)*2 - .5;
float newy = (oldy+.5)*2 - .5;
conversely, to go to an image of half the resolution...
float newx = (oldx+.5)/2 - .5;
float newy = (oldy+.5)/2 - .5;
Draw yourself a diagram if you need to and convince yourself it works, remember 0 indexing. Instead of thinking about making your transformation work on other resolutions, think about moving every point to the resolution of your transform, then using your transform, then moving it back. Fortunately you can do all of this in 1 matrix, we just need to build that matrix! First build a matrix for each of the three steps
//move point to an image of half resolution, note it is equivalent to the above equation
project_down=(.5,0,-.25,
0,.5,-.25,
0, 0, 1)
//move point to an image of twice resolution, these are inverses of one another
project_up=(2,0,.5,
0,2,.5,
0, 0,1)
To make your final transformation just combine them
final_transform = [project_up][your_homography][project_down];
The nice thing is you only have to do this once for any given homography. This should work the same as getPerspectiveTransform (and probably run faster). Hopefully understanding this will help you deal with other questions you may run into regarding image resolution changes.
Let B be the transformation you have computed, you can multiply B by another homography, A, to get AB = C, where C is a homography that does both transformations, this is equivalent to apply first B and then A. To find A you can use getPerspectiveTransform.
Edit: by AB I meant matrix multiplication, not element-wise multiplication.
Edit 2: to get A you pass the four corners of the two images in the same order to getPerspectiveTransform such that the corners of the downsampled image are the source points and the corners of the original image are the destination points.
I've been working on a problem for several weeks and have reached a point that I'd like to make sure I'm not overcomplicating my approach. This is being done in OpenGL ES 2.0 on iOS, but the principles are universal, so I don't mind the answers being purely mathematical in form. Here's the rundown.
I have 2 points in 3D space along with a control point that I am using to produce a bezier curve with the following equation:
B(t) = (1 - t)2P0 + 2(1 - t)tP1 + t2P2
The start/end points are being positioned at dynamic coordinates on a fairly large sphere, so x/y/z varies greatly, making a static solution not so practical. I'm currently rendering the points using GL_LINE_STRIP. The next step is to render the curve using GL_TRIANGLE_STRIP and control the width relative to height.
According to this quick discussion, a good way to solve my problem would be to find points that are parallel to the curve along both sides taking account the direction of it. I'd like to create 3 curves in total, pass in the indices to create a bezier curve of varying width, and then draw it.
There's also talk of interpolation and using a Loop-Blinn technique that seem to solve the specific problems of their respective questions. I believe that the solutions, however, might be too complex for what I'm going after. I'm also not interested bringing textures into the mix. I prefer that the triangles are just drawn using the colors I'll calculate later on in my shaders.
So, before I go into more reading on Trilinear Interpolation, Catmull-Rom splines, the Loop-Blinn paper, or explore sampling further, I'd like to make sure what direction is most likely to be the best bet. I think I can say the problem in its most basic form is to take a point in 3D space and find two parallel points along side it that take into account the direction the next point will be plotted.
Thank you for your time and if I can provide anything further, let me know and I'll do my best to add it.
This answer does not (as far as I see) favor one of the methods you mentioned in your question, but is what I would do in this situation.
I would calculate the normalized normal (or binormal) of the curve. Let's say I take the normalized normal and have it as a function of t (N(t)). With this I would write a helper function to calculate the offset point P:
P(t, o) = B(t) + o * N(t)
Where o means the signed offset of the curve in normal direction.
Given this function one would simply calculate the points to the left and right of the curve by:
Points = [P(t, -w), P(t, w), P(t + s, -w), P(t + s, w)]
Where w is the width of the curve you want to achieve.
Then connect these points via two triangles.
For use in a triangle strip this would mean the indices:
0 1 2 3
Edit
To do some work with the curve one would generally calculate the Frenet frame.
This is a set of 3 vectors (Tangent, Normal, Binormal) that gives the orientation in a curve at a given parameter value (t).
The Frenet frame is given by:
unit tangent = B'(t) / || B'(t) ||
unit binormal = (B'(t) x B''(t)) / || B'(t) x B''(t) ||
unit normal = unit binormal x unit tangent
In this example x denotes the cross product of two vectors and || v || means the length (or norm) of the enclosed vector v.
As you can see you need the first (B'(t)) and the second (B''(t)) derivative of the curve.