I am not too familiar with Eigen library and I am stuck at this piece of code
motor1_to_motor2 =
Translation3f(BigApeLeg::distanceJoint1ToJoint3(),0,0)
* AngleAxisf(M_PI_2, Vector3f::UnitX())
* AngleAxisf(M_PI_2, Vector3f::UnitY());
motor1_to_motor2 is Eigen::Affine3f
Big::ApeLeg::distance... must return a float number
But my issue is what is Translation3f and what is AngleAxisf. What do they do? And what do they return?
I am familiar with some basic transformations. I would really appreciate if someone can give me any pointers? Thanks
As its name stands, a Translation3f represents a 3D translation using floats. An AngleAxisf represents a 3D rotation of given angle around given axis. Both are class constructors, not functions.
motor1_to_motor2 is thus an affine transformation applying a rotation around Y followed by a rotation around X and finally a translation along the X axis.
This doc should give you a good introduction on space transformations in Eigen.
Related
I am trying to do 3D reconstruction using SFM (Structure From Motion). I am pretty new to computer vision and doing this as a hobby, so if you use acronyms please also let me know what it stands for so I can look it up.
Learning wise, I have been following this information :
https://www.youtube.com/watch?v=SyB7Wg1e62A&list=PLgnQpQtFTOGRYjqjdZxTEQPZuFHQa7O7Y&ab_channel=CyrillStachniss
https://imkaywu.github.io/tutorials/sfm/#triangulation
Plus links below from quick question.
My end goal is to use this on persons face, to create a 3D face reconstruction. If people have advice on this topic specifically please let me know as well.
I do the following steps :
IO using OpenCV. A video taken using a single camera.
Find intrinsic parameters and distortion coefficients of the camera using Zhangs method.
Use SIFT to find features from frame 1 and frame 2.
Feature matching is done using cv2.FlannBasedMatcher().
Compute essential matrix using cv2.findEssentialMat().
Projection matrix of frame 1 is set to numpy.hstack((numpy.eye(3), numpy.zeros((3, 1))))
Rotation and Translation are obtained using cv2.recoverPose().
Using Rotation and Translation we get the Projection Matrix of frame 2
curr_proj_matrix = cv2.hconcat([curr_rotation_matrix, curr_translation_matrix]).
I use cv2.undistortPoints() on feature pts for frame 1 and 2, using information from step 2.
Lastly, I do triangulation points_4d = triangulation.triangulate(prev_projection_matrix, curr_proj_matrix, prev_pts_u, curr_pts_u)
Then I reassign prev values to be equal curr values and continue through the video.
I use matplotlib to display the scatter plot.
Quick Question :
Why do some articles do E = (K^-1)T * F * K and some E = (K)T * F * K.
First way : What do I do with the fundamental matrix?
Second way : https://harish-vnkt.github.io/blog/sfm/
Issue :
As you can see the scatter plot looks a bit warped, I am unsure why, or if I am missing a step, or doing something wrong. Hence looking for advise.
Also the Z axis, is all negative.
One of the guesses I had, was that the video is in 60 FPS and even though I am moving the camera relatively quickly, it might not be enough of the rotation + translation to determine the triangulation. However, removing frames in between, did not make much difference.
Please let me know if you would like me to provide some of the code.
I believe I have an answer but I am not sure why it works. Hence if someone could expand, plus mention what the 3rd column of the 4D points is, then I will approve that answer and delete this.
Doing this on 4D points after triangulation : points_4d /= points_4d[3] (1)
The documentation does not mention it : https://docs.opencv.org/4.5.3/d9/d0c/group__calib3d.html#gad3fc9a0c82b08df034234979960b778c
My best guess, is that doing (1) is similar to doing this : cv2.convertPointsFromHomogeneous(). Converting from homogeneous space to euclidean space.
Edit 20211003 : Please see a comment for further explanation.
I've been playing around with both Matlab & Apples documentation in regards to CMRotationMatrix for weeks.
I've found that I could easily re-create CMRotationMatrix by calculating it with Roll, Yaw & Pitch.
However, I've found no resources/documentation on how to create a Rotation Matrix from XYZ rotations from either gravity or userAcceleration.
All I found was how they create a 4x4 matrix in their VideoSnake demo.
So my question is, does anyone have any input of how to create a 3x3 matrix from XYZ rotations?
To begin with rotation matrix has vast applications in Physics, Geometry and Computer Graphics according to Wikipedia. Now looking at it from this angle in relation to your question where you made mention of gravity and userAcceleration we are seeing a synergy between principles in relation to physics where we can make mention of spacecraft exploration which depends 100 percent on gravity.
Now getting to the meat of the matter on XYZ rotations in relation to Rotation Matrix there is an abstract figure which is denoted on the origin point of the XYZ axes without any specifics to a particular angle as a starting point.
Now this is the part you have to understand, since we are using abstract and arbitrary figures we need to convert this XYZ axis point into direction vectors which can then be understood in real life world coordinates.
Only then we will be able to synergistically relate Rotation Matrix and XYZ coordinate points
Now to conclude
The essence of using this direction vector is to convert the direction into equivalent direction in cognisance with the rotation matrix which can then be effectively utilised and expressed on the platform-local coordinates
I'm studying Introduction to robotic and found there is different equations to determine the position and orientation for the end effector of a robot using DH parameters transformation matrix, they are :
1.
Translate by d_i along the z_i-axis.
Rotate counterclockwise by theta_i about the z_i-axis.
Translate by a_{i-1} along the x_{i-1}-axis.
Rotate counterclockwise by alpha_{i-1} about the x_{i-1}-axis.
2.
Rotate by theta_i about the Z_i-axis.
Translate by d_i along the z_i-axis.
Translate by a_(i-1) along the x(i-1)-axis.
Rotate by alpha_(i-1)along the x(i-1)-axis.
3.
Rotate by alpha_(i-1)along the x(i-1)-axis.
Translate by a_(i-1)along the x(i-1)-axis.
Rotate by theta_i about the Z_i-axis.
Translate by d_i about the Z_i-axis.
What is the difference between them? Will the result be different?
Which one should I use when calculating the position and orientation?
As far as I know there is no difference. They should all give you the same end result, but be consistent. pick one form and stick with it.
The main problem comes when you are trying to reverse the process. Using method 1 to got from time t to t+1 is fine, but if you wanted to go from t+1 to t you would need to use method 1. Using another method to do the transform (though it should technically work) usually doesn't because nonlinearities in modeling and rounding errors for rotation (cos and sin terms).
This isn't really surprising though, it's the same issue you encounter when going from a local reference(with respect to a robot) to a global reference. The order of translations and rotations must be maintained for forward and backword transformations
I got a task:
We have a system working where a camera does a halfcircle around a human head. We know the camera matrix and the rotation/translation of every frame. (Distortion and more... but I want first to work without these parameters)
My task is that I have only the Camera Matrix, which is constant over this move, and the images (more than 100). Now I have to get the translation and rotation from frame by frame and compare it with the rotation and translation in real world (from the system which I have but only for compare, I have too prove it!)
First steps I did so far:
use the robustMatcher from the OpenCV Cookbook - works finde - 40-70 Matches each frame - visible looks it very good!
I get the fundamentalMatrix with getFundamental(). I use the robust Points from robustMatcher and RANSAC.
When I got the F i can get the Essentialmatrix E with my CameraMatrix K like this:
cv::Mat E = K.t() * F * K; //Found at the Bible HZ Chapter 9.12
Now we need to extract R and t out of E with SVD. By the way camera1 position is just zero because we have to start somewhere.
cv::SVD svd(E);
cv::SVD svd(E);
cv::Matx33d W(0,-1,0, //HZ 9.13
1,0,0,
0,0,1);
cv::Matx33d Wt(0,1,0,//W^
-1,0,0,
0,0,1);
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt; //HZ 9.19
cv::Mat R2 = svd.u * cv::Mat(Wt) * svd.vt; //HZ 9.19
//R1 or R2???
R = R1; //R2
//t=+u3 or t=-u3?
t = svd.u.col(2); //=u3
This is my actual status!
My plans are:
triangulate all points to get 3D points
Join frame i with frame i++
Visualize my 3D points them somehow!
Now my Questions are:
is this robust matcher dated? is there a other method?
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how R and t can be!
´P′ = [UWV^T |+u3] oder [UWV^T |−u3] oder [UW^TV^T |+u3] oder [UW^TV^T |−u3]´
P´ is the projectionmatrix of the second image.
That means t could be - or + and R could be total different?!
I found out that I should calculate one point into 3D and find out if this point is infront of both cameras, then I have found the correct matrix!
I found some of this code at the internet and he just said this no further calculating:
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt
and
t = svd.u.col(2); //=u3
Why is this correct? If it isn't - how would I do this triangulation in OpenCV?
I compared this translation to the translation which is given to me. (First i had to transfer the translation and rotation in relationship to camera1 but I got this now!) But its not the same. The values of my program are just lets call it jumping from plus too minus. But it should be more constant because the camera is moving in a constant circle.
I am sure that some axes may be switched. I know that the translation is only from -1 till 1 but I thought I could extract a factor from my results to my comparevalues and then it should be similiar.
Does somebody have done something like this before?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
I know that visual sfm can do this somehow. (At youtube is a video where someone walks around a tree and get from these pictures a reconstruction of this tree using visual sfm)
This is pretty the same what I have to do.
Last question:
Does somebody know an easy way to visualize my 3D Points? I prefere MeshLab. Some experience with that?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
A chess board or checker board is used to find the internal/intrinsic matrix/parameters, not the extrinsic parameters. You're saying you have got the internal matrix already, I suppose that's what you meant by
We know the camera matrix and ...
Those videos you have seen on youtube have done the same, the camera is already calibrated, that is the internal matrix is known.
is this robust matcher dated? is there a other method?
I don't have that book so cant see the code and answer this.
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
You need to cancel the radial distortion first, see undistortPoints.
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
R is the orientation of the second camera in the first camera's coordinate system. And T is position of the second camera in that coordinate system. These have several usages.
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how ....
Read the relevant section of the bible, this is very well explained there, triangulation is naive method, a better approach is explained there.
Does somebody know an easy way to visualize my 3D Points?
To see them in Meshlab a very easy way is to save the coordinate of the 3D points in a PLY file, this is an extremely simple format and supported by Meshlab and almost all other 3D model viewers.
In this article "An Efficient Solution to the Five-Point Relative Pose Problem", Nistér explain a very good method to determine which of the four configurations it the correct one (talking about R and T).
I've tried the robust matcher and I think is quiet good. The problems that has this matcher is that is really slow because it uses SURF, maybe you should try with others detectors and extractors to improve the speed.I also believe that the function in OpenCV that calculates the fundamental matrix does not need the Ransac parameter because the methods rate and symmetry do a great job removing the outliers, you should try the 8-point parameter.
OpenCV has the function triangulate, this only needs two projection Matrices, points that are in the first and the second image. Check the calib3d module.
I'm having trouble understanding how the camera works in my test application. I've been able to piece together a working camera - now I am trying to make sure I understand how it all works. My camera is encapsulated in its own class. Here is the update method that gets called from my Game.Update() method:
public void Update(float dt)
{
Yaw += (200 - Game.MouseState.X) * dt * .12f;
Pitch += (200 - Game.MouseState.Y) * dt * .12f;
Mouse.SetPosition(200, 200);
_worldMatrix = Matrix.CreateFromAxisAngle(Vector3.Right, Pitch) * Matrix.CreateFromAxisAngle(Vector3.Up, Yaw);
float distance = _speed * dt;
if (_game.KeyboardState.IsKeyDown(Keys.E))
MoveForward(distance);
if (_game.KeyboardState.IsKeyDown(Keys.D))
MoveForward(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.S))
MoveRight(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.F))
MoveRight(distance);
if (_game.KeyboardState.IsKeyDown(Keys.A))
MoveUp(distance);
if (_game.KeyboardState.IsKeyDown(Keys.Z))
MoveUp(-distance);
_worldMatrix *= Matrix.CreateTranslation(_position);
_viewMatrix = Matrix.Invert(_worldMatrix); // What's gong on here???
}
First of all, I understand everything in this method other than the very last part where the matrices are being manipulated. I think the terminology is getting in my way as well. For example, my _worldMatrix is really a Rotation Matrix. What really baffles me is the part where the _viewMatrix is calculated by inverting the _worldMatrix. I just don't understand what this is all about.
In prior testing, I always used Matrix.CreateLookAt() to create a view matrix, so I'm a bit confused. I'm hoping someone can explain in simple terms what is going on.
Thanks,
-Scott
One operation the view matrix does for the graphics pipeline is that it converts a 3d point from world space (the x, y, z, we all know & love) into view (or camera) space, a space where the camera is considered to be the center of the world (0,0,0) and all points/objects are relative to it. So while a point may be at 1,1,1 relative to the world, what are it's cordinates relative to the camera location? Well, as it turns out, to find out, you can transform that point by the inverse of a matrix representing the camera's world space position/rotation.
It kinda makes sense if you think about it... let's say the camera position is 2,2,2. An arbitrary point is at 3,3,3. We know that the point is 1,1,1 away from the camera, right? so what transformation would you apply to the point 3,3,3 in order for it to become 1,1,1 (it's location relative to the camera)? you would transform 3,3,3 by -2,-2,-2 to result in 1,1,1. -2,-2,-2 is also the camera's inverted position. That example was for translation because it is relatively easy to groc but basically the same happens for rotation. But don't expect to be able to simply negate all basis vectors to invert a matrix... there is a little more going on with that for rotation.
The Matrix.CreateLookAt() method automatically returns the inverted matrix so you don't really notice it happening unless you reflect its code.
Taking that one step further, the Projection matrix then takes that point in view space and projects it onto a flat surface and that point that started out in 3d space is now in 2d space.