What is the math that SCNLookAtConstraint is doing? I want to try to recreate this with vectors.
I think it can be done with a cross product and a dot product once you have the two directional vectors.
By default the node points in the direction of the negative z-axis of its local coordinate system.
The other direction we are interested in is from the node that looks to the other node, in the node that looks's local coordinate system. You can get it by converting the positions using convertPosition:fromNode: or
convertPosition:toNode:.
If not done already, normalize the two directional vectors.
With the two directions in the local coordinate system, a cross product between the two gives a vector that is orthogonal to the plane that can be formed between the two directions. This vector is the surface normal to that plane. Any rotation around that normal is going to be another vector that remains in the plane.
Since the two directions are normalized, a dot product of the two should give you cos(ϴ), where ϴ is the angle between the two.
Rotating the first vector (the one that points in the direction of the negative z-axis) by this angle around the normal to the plane should make it point in the same direction as the second vector (that one that points at the other node).
That should be the way it's done for two vectors (or at least one way to do it).
To do it for a node, you would set a rotation of that angle around that axis, to the node that is looking. This would rotate the node so that it's local negative z-axis (the direction it's looking) would point at the other node.
I have a very similar example in one of the chapters for 3D Graphics with Scene Kit, where a node is rotated to point straight out of the surface of a sphere. You can look at the sample code to see how it's solved there.
Related
I'm coding a calibration algorithm for my depth-camera. This camera outputs an one channel 2D image with the distance of every object in the image.
From that image, and using the camera and distortion matrices, I was able to create a 3D point cloud, from the camera perspective. Now I wish to convert those 3D coordinates to a global/world coordinates. But, since I can't use any patterns like the chessboard to calibrate the camera, I need another alternative.
So I was thinking: If I provide some ground points (in the camera perspective), I would define a plane that I know should have the Z coordinate close to zero, in the global perspective. So, how should I proceed to find the transformation matrix that horizontalizes the plane.
Local coordinates ground plane, with an object on top
I tried using the OpenCV's solvePnP, but it didn't gave me the correct transformation. Also I thought in using the OpenCV's estimateAffine3D, but I don't know where should the global coordinates be mapped to, since the provided ground points do not need to lay on any specific pattern/shape.
Thanks in advance
What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.
To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:
Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
Easier explanation: http://nghiaho.com/?page_id=671
Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_
So, if you want to transform 3D points from the camera reference frame, do the following:
As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.
Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:
q = R.dot(p)+t
EDIT: answer when 3D position of points in world are unspecified
Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.
Here is a possible solution:
Take the selected 3D points in camera reference frame, let's call them q'i.
Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
Use these values of pi and qi to estimate R and t, as in the first part of the answer!
Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.
I've been playing around with both Matlab & Apples documentation in regards to CMRotationMatrix for weeks.
I've found that I could easily re-create CMRotationMatrix by calculating it with Roll, Yaw & Pitch.
However, I've found no resources/documentation on how to create a Rotation Matrix from XYZ rotations from either gravity or userAcceleration.
All I found was how they create a 4x4 matrix in their VideoSnake demo.
So my question is, does anyone have any input of how to create a 3x3 matrix from XYZ rotations?
To begin with rotation matrix has vast applications in Physics, Geometry and Computer Graphics according to Wikipedia. Now looking at it from this angle in relation to your question where you made mention of gravity and userAcceleration we are seeing a synergy between principles in relation to physics where we can make mention of spacecraft exploration which depends 100 percent on gravity.
Now getting to the meat of the matter on XYZ rotations in relation to Rotation Matrix there is an abstract figure which is denoted on the origin point of the XYZ axes without any specifics to a particular angle as a starting point.
Now this is the part you have to understand, since we are using abstract and arbitrary figures we need to convert this XYZ axis point into direction vectors which can then be understood in real life world coordinates.
Only then we will be able to synergistically relate Rotation Matrix and XYZ coordinate points
Now to conclude
The essence of using this direction vector is to convert the direction into equivalent direction in cognisance with the rotation matrix which can then be effectively utilised and expressed on the platform-local coordinates
I'm working on an object tracking application using openCV. I want to convert my pixel coordinates to world coordinates to get more meaningful information. I have read a lot about computing the perspective transform matrix, and I know about cv2.solvePnP. But I feel like my case should be special, because I'm tracking a runner on a track and field runway with the runway orthogonal to the camera's z-axis. I will set up the camera to ensure this.
If I just pick two points on the runway edge, I can calculate a linear conversion from pixels to world coords at that specific height (ground level) and distance from the camera (i.e. along that line). Then I reason that the runner will run on a line parallel to the runway at a different height and slightly different distance from the camera, but the lines should still be parallel in the image, because they will both be orthogonal to the camera z-axis. With all those constraints, I feel like I shouldn't need the normal number of points to track the runner on that particular axis. My gut says that 2-3 should be enough. Can anyone help me nail down the method here? Am I completely off track? With both height and distance from camera essentially fixed, shouldn't I be able to work with a much smaller set of correspondences?
Thanks, Bill
So, I think I've answered this one myself. It's true that only two correspondence points are needed given the following assumptions.
Assume:
World coordinates are set up with X-axis and Y-axis parallel to the ground plane. X-axis is parallel to the runway.
Camera is translated and possibly rotated about X-axis (angled downward), but no rotation around Y-axis(camera plane parallel to runway and x-axis) or Z-axis (camera is level with respect to ground).
Camera intrinsic parameters are known from camera calibration.
Method:
Pick two points in the ground plane with known coordinates in world and image. For example, two points on the runway edge as mentioned in original post. The line connecting the poitns in world coordinates should not be parallel with either X or Z axis.
Since Y=0 for these points, ignore the second column of the rotation/translation matrix, reducing the projection to a planar homography transform (3x3 matrix). Now we have 9 degrees of freedom.
The rotation assumptions will enforce a certain form on the rotation/translation matrix. Namely, the first column and first row will be the identity (1,0,0). This further reduces the number of degrees of freedom in the matrix to 5.
Constrain the values of the second column of the matrix such that cos^2(theta)+sin^2(theta) = 1. This reduces the number of unknowns to only 4. Two correspondence points will give us the 4 equations we need to calculate the homography matrix for the ground plane.
Factor out the camera intrinsic parameter matrix from the homography matrix, leaving the rotation/translation matrix for the ground plane.
Due to the rotation assumptions made earlier, the ignored column of the rotation/translation matrix can be easily constructed from the third column of the same matrix, which is the second column in the ground plane homography matrix.
Multiply back out with the camera intrinsic parameters to arrive at the final universal projection matrix (from only 2 correspondence points!)
My test implentation has worked quite well. Of course, it's sensitive to the accuracy of the two correspondence points provided, but that's kind of a given.
Given a point on a plane A, I want to be able to map to its corresponding point on plane B. I have a set of N corresponding pairs of reference points between the two planes, however, the overall mapping is not a simple affine transform (no homographies for me).
Things I have tried:
For a given point, find the three closest reference points in plane A, compute barrycentric coordinates of that triangle, and then apply that transform to the corresponding reference points in plane B. How it failed: sometimes the three closest points were nearly collinear, so errors were huge. Also, there was no consistency in the mapping when crossing borders. It was very "jittery."
Compute all possible triangles given the N reference points (N^3). Order them by size. For the given point, find the smallest triangle that it's in. This fixes the linearly of the
points problem, but was still extremely jittery and slow.
Start with a triangulated plane A. Iterate through the reference points, adding each one to the reference plane. Every time you add a point it exists in at least one triangle. Break that triangle into three triangles using the new reference point as a vertex. You end up with plane A triangulated so you can map from plane A to plane B with ease. Issues: You can prove that every triangle will have a point that is on the edge of the planes. This results in huge errors if your reference points are far from the edge of the planes.
I feel like this should be a fairly standard problem. Are there standard algorithms/libraries for this?
There you go my friend.. I have used it myslef and can only recommend you give it a try.
Kahn Academy - Matrix transformations
Understanding how we can map one set of vectors to another set. Matrices used to define linear transformations
https://www.khanacademy.org/math/linear-algebra/matrix_transformations
I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.