Using PyDrake AutoDiff to compute the Jacobian of a vector in some frame - drake

Suppose I have a vector n_F which is fixed in a frame F (for example, a normal vector on the surface of a fingertip that is fixed in the local fingertip frame). The vector n_W(q) depends on the configuration via the expression n_W = R_WF # n_F, where the rotation matrix R_WF depends on the configuration via the forward kinematics map.
My question is how to recover the Jacobian Dn_W of n_W with respect to q (which will be some 3 by n matrix) using Drake's AutoDiff. I figure there must be some internal implementation of this since constraints between two angles in different frames are enforceable during IK, and I'm assuming IK is solved with a gradient-based solver. However, I'm having trouble working with this since AutoDiffXd seems to only work on scalar functions.
Alternatively, if there's some simple way to express this Jacobian using available Drake functions, that would also suffice for my application - I haven't been able to work out a clean expression for this by hand.

I have a (non-AutoDiff) solution currently implemented that does not involve rotations.
Let J0 and J1 denote the translational velocity Jacobians of the origin of the frame F and of the vector n_F both expressed in frame F, where the velocities are measured and expressed in the world frame. Then, The desired Jacobian Dn_W is simply J1 - J0.

Related

Why do we need to convert Rotational vector to Rotational matrix to calculate angle

I am not getting any proper source to understand why we need to change rotational vector to rotational matrix [in the context of calculating angle between two ARUco markers].
We are using
rmat = cv2.Rodrigues(rvec)
rmat1 =cv2.Rodrigues(rvec1)
relative_rmat = rmat1#rmat.T
My questions are
why are we converting the rotational vector to rotational matrix
And can I please get the source of relative_rmat's formula. I am tryna understand the geometrical concept
I have tried to understand from Wikipedia. But I am getting more confused. It would be helpful if anyone can provide the source of the concept for both the questions
Translation of a rigid body (or tvec as used in OpenCV conventions) lives in 3D Euclidean space. We call this the 'configuration space'.
Assume there is a rigid body at a point we call pos1. A 3x1 vector pos1 = [x1, y1, z1] completely defines its position uniquely. The term 'unique' means that there is no other way to define pos1 without [x1, y1, z1], and also if you go to [x1, y1, z1], it will always be pos1.
Any attitude (a rotation of a rigid body around three-dimensional space) can be represented in many different ways. However, attitudes live in a place called 'Special Orthogonal Space or SO(3)'. This is the configuration space for rotations, and elements in this space are what you were referring to as 'rotation matrices'.
All other ways of defining a rotation like Euler angles, rotation vectors (rvec in OpenCV), or quaternions are 'local parameterizations' of a given rotation matrix. So they have several issue, including not being unique at some rotations. Gimbal Lock wiki page has some nice visualizations.
To avoid such issues, the simplest way is to use rotation matrices. Even though it seems complex, rotation matrices can be so much easier to work with once you get used to it. Given the properties of SO(3), the inverse of a rotation matrix is the transpose of that matrix (which is why you get the rmat.T when you are trying to get the relative rotation).
Let's assume that you have two markers named 'marker' and 'marker1' that corresponds to rvec and rvec1 respectively. Your rmat is the rotation of 'marker' with respect to the camera frame, or how a vector in 'marker' can be represented in the camera frame (I know this can be confusing, but this is how this is defined, so stay with me).
Similarly, rmat1 is how a vector in 'marker1' is represented in the camera frame. Also, keep in mind that these matrices are directional, meaning we need to know inverse(rmat1) if we need to find the how a vector in camera frame is represented in the 'marker1' frame.
Your relative_rmat is how you represent a vector in 'marker' in 'marker1'. You cannot randomly hop on to different markers, and always need to go through a common place. First you have to transform a vector in 'marker' to camera frame, and then transform that to the 'marker1' frame. We can write that as
relative_rmat = rmat1 # inverse(rmat)
But as I said above, a special property of a rotation matrix dictates that its inverse is as same as its transpose. So we write it as:
relative_rmat = rmat1 # rmat.T
The order matter here and you should always start with the first rotation and pre-multiply subsequent rotations. If you want to go the other way, you just need to take the inverse of the relative_rmat. So with simple matrix math, we can see that it as same as if we follow the rotations as I described above manually.
relative_rmat_inverse
= relative_rmat.T
= (rmat1 # rmat.T).T
= (rmat.T).T # rmat1.T
= rmat # rmat1.T
It is hard to detail everything here, and it can take a while to understand the math behind the rotation matrices. I would recommend this as a good reference, but it can get very technical depending on your background. If you are new to this, start with basics of robotics, coordinate transformations, and then move to SO(3).

Direct Transcription/Collocation for Rigid Body Rotation Trajectory in SO(3)

I am trying to use DirectTranscription and DirectCollocation to find the optimal trajectory in SO(3) for a free-floating box i.e. rotating a satellite from one pose to another. To start with, I'm using the Example 6.12: Reorientation of a Rigid Body from Betts, J. T. (2010). Practical Methods for Optimal Control and Estimation Using Nonlinear Programming.
Here, the body inertia is given as Ixx = 5621, Iyy = 4547, Izz = 2364. I've made a simple box.urdf with these inertias.
The initial rotation quaternion is [1,0,0,0] and the final rotation quaternion is [cos(phi/2),sin(phi/2),0,0] (phi being 150deg) with the initial and final rotation velocities being zero. The position and its derivative are set as zero for initial and final states.
As this is basically a double integrator problem in multiple axes in SO(3), the optimal trajectory is a bang-bang controller.
I am using the get_applied_generalized_force_input_port() of the MultibodyPlant for the actuation (u) and hence this port index is used to initialize the trajectory optimization.
In both, DirectTransciption and DirectCollocation, the optimization fails but for different reasons:
DirectTransciption : This fails due to infeasible constraints. But the same constraints seems to work for DirectCollocation as that gives another error. The code snippet used:
NDirTrans = 300
prog = DirectTranscription(plant_d, trajOptContext, NDirTrans,
input_port_index=plant_d.get_applied_generalized_force_input_port().get_index())
prog.AddBoundingBoxConstraint(x0, x0, prog.initial_state())
prog.AddBoundingBoxConstraint(xf, xf, prog.final_state())
for u_index in range(prog.input().size):
prog.AddConstraintToAllKnotPoints(prog.input()[u_index] <= 50)
prog.AddConstraintToAllKnotPoints(prog.input()[u_index] >= 50)
result = Solve(prog)
DirectCollocation: This fails due to the following error:
RuntimeError: Encountered singular articulated body hinge inertia for body node index 1.
Please ensure that this body has non-zero inertia along all axes of motion.
I verify that the body at node index 1 does have inertia parameters along all the axes and hence I'm stuck here.
A google colab script to reproduce the errors can be seen here. It is not needed to run the program to see the errors as the cell outputs are saved and can be viewed without running the program. This includes viewing the Spatial Inertia matrix for Body with Node Index 1.
I don't think direct transcription would work for your formulation. Direct transcription uses explicit Euler method, which says x[n+1] = x[n] + xdot[n] * dt. Since part of your state is the quaternion, you have
quat[n+1] = quat[n] + quat_dot[n] * dt
On the other hand, the quaternion must be on the SO(3) manifold (here it means the quaternion has to be on the unit circle), and the time derivative of a quaternion is perpendicular to that quaternion, as a result quat[n] + quat_dot[n] * dt is on the tangent surface of the unit circle, passing through quat[n]. The only intersection between this tangent surface and the unit circle is quat[n] itself, hence the only solution satisfying the explicit Euler integration is quat[n+1] = quat[n], which means all the quaternions are the same along the trajectory, hence the infeasible message you got.
When you use direct collocation, the problem is mitigated, but still it is not the right way to do integration/interpolation on SO(3) (the direct collocation doesn't always satisfy the SO(3) constraint). You could check the paper on the traditional way to integrate on SO(3).

What is the recommended way of constraining floating base quaternion position and angular velocity?

For objects with floating base, rotation in the generalized position is expressed as a 4D quaternion. However, the rotation in the generalized velocity is still expressed as a 3D spatial rotation vector.
Is there a recommended way of constraining them for, e.g. backward euler?
prog.AddConstraint(eq(q[t+1], q[t] + h[t] * qd[t+1]))
Is it appropriate to convert the angular velocity into quaternion form, e.g. here?
Or as John T. Betts puts it in Practical Methods for Optimal Control and Estimation Using Nonlinear Programming
I think for your use case, you want to impose the integration constraint between adjacent waypoints in planning. In this case, I wouldn't suggest to use the Euler integration q[n] = q[n+1] - qdot[n+1] * dt directly. The reason is that the quaternion should always satisfy the unit length constraint (i.e., q[n] and q[n+1] are both on the surface of the unit sphere in 4D). Hence the time derivative qdot is along the tangent surface of this unit sphere, namely q[n+1] - qdot[n+1] * dt is on that tangent surface. No point (other than q[n+1]) on the tangent surface is also on the surface of the unit circle. Hence the constraint q[n+1] = q[n] + qdot[n+1] * dt
can only be satisfied when dt=0, when you couple it with the constraint |q[n]| = |q[n+1]|=1. Pictorially, you could use the figure in the "tangent to a circle" section of https://www.toppr.com/guides/maths/circles/tangents-to-the-circle/. Notice that q[n+1] - qdot[n+1] * dt is on the tangent line of the sphere, and q[n] is on the circle, so you can't have q[n] = q[n+1] - qdot[n+1] * dt.
Instead, I would suggest to consider the following condition
q[n+1] = q[n] ⊗ Δq
where q[n] is the quaternion of the floating base at the n'th waypoint, q[n+1] is the quaternion of the floating base at the n+1'th waypoint. ⊗ is the Hamiltonian product between two quaternions q[n] and Δq. So here you need to compute Δq from the angular velocity.
Suppose that the average angular velocity between the n'th and n+1'th waypoints is ω ∈ ℝ³. This means that the floating base rotates about an axis a = ω/|ω| at rate |ω|, hence
Δq = [cos(|ω|Δt/2), a*sin(|ω|Δt/2)]
See https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation#Using_quaternion_as_rotations for an explanation on the equation above.
If you use ω[n+1] as the average angular velocity from the n'th waypoint to n+1'th waypoint (analogous to using v[n+1] in backward Euler), then your integration constraint is
q[n+1] = q[n] ⊗ [cos(|ω[n+1]|Δt/2), a*sin(|ω[n+1]|Δt/2)]
You could also use the other quantities as the average angular velocity, for example (ω[n] + ω[n+1])/2.
There is another way to look at the quaternion integration as given in the book by Betts [1] but this has two caveats to look out for. As given in the book, we cannot use the ODEs directly (6.123a - 6.123d) for quaternion integration. Rather, we can use the DAEs given in the book, equations 6.126a - 6.126c and 6.126g. These work as we can view quaternions as a version of axis angle representation:
quat = (cos(phi/2), unit_vec * sin(phi/2))
Once we integrate the vector part (using either Euler or Range-Kutta Higher-Order Methods), the scalar part is determined by the unit quaternion constraint of rotation quaternions. The vector part can be integrated using the quaternion derivative formulation from the angular velocity as given in Betts [1].
This process has the following two caveats:
This method is only good for very small delta Ts
There is a singularity in this method. Every time, we use only 3 numbers to represent rotations (here, we use the vector part of the quaternion only for integration), we introduce a singularity. The scalar part of the axis-angle quaternion is indeed determined by unit length constraint except at -pi or pi. At these points, this mapping does not work. For an in-depth derivation/understanding of why this happens check the determinant in equation 8 in [2].
The singularity occurs when the angle in the axis-angle representation is at pi/-pi. It is important to note that this does not necessarily coincide with crossing pi/-pi in the individual x/y/z axis. Although this can also happen. Run the Betts example 6.8 for rotation of more than pi about the x-axis to see this in action.
Hence, if you can ensure the angle in your axis-angle representation during the trajectory will not cross pi/-pi, you can use the integration method using DAEs given in Betts for simpler implementation. But as soon as you cross pi/-pi, this will not work. Then you will have to use proper quaternion integration. as given by Hongkai Dai with certain care in the implementation. You can check his other answers on this topic to find the implementation details. Also check the UnitQuaternionConstraint in Drake for this.
[1] - Betts, J. T. (2010). Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. In Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics.
[2] - Yang, Y. (2010). Quaternion based model for momentum biased nadir pointing spacecraft. Aerospace Science and Technology, 14(3), 199–202. https://doi.org/10.1016/j.ast.2009.12.006

3D reconstruction from two calibrated cameras - where is the error in this pipeline?

There are many posts about 3D reconstruction from stereo views of known internal calibration, some of which are excellent. I have read a lot of them, and based on what I have read I am trying to compute my own 3D scene reconstruction with the below pipeline / algorithm. I'll set out the method then ask specific questions at the bottom.
0. Calibrate your cameras:
This means retrieve the camera calibration matrices K1 and K2 for Camera 1 and Camera 2. These are 3x3 matrices encapsulating each camera's internal parameters: focal length, principal point offset / image centre. These don't change, you should only need to do this once, well, for each camera as long as you don't zoom or change the resolution you record in.
Do this offline. Do not argue.
I'm using OpenCV's CalibrateCamera() and checkerboard routines, but this functionality is also included in the Matlab Camera Calibration toolbox. The OpenCV routines seem to work nicely.
1. Fundamental Matrix F:
With your cameras now set up as a stereo rig. Determine the fundamental matrix (3x3) of that configuration using point correspondences between the two images/views.
How you obtain the correspondences is up to you and will depend a lot on the scene itself.
I am using OpenCV's findFundamentalMat() to get F, which provides a number of options method wise (8-point algorithm, RANSAC, LMEDS).
You can test the resulting matrix by plugging it into the defining equation of the Fundamental matrix: x'Fx = 0 where x' and x are the raw image point correspondences (x, y) in homogeneous coordinates (x, y, 1) and one of the three-vectors is transposed so that the multiplication makes sense. The nearer to zero for each correspondence, the better F is obeying it's relation. This is equivalent to checking how well the derived F actually maps from one image plane to another. I get an average deflection of ~2px using the 8-point algorithm.
2. Essential Matrix E:
Compute the Essential matrix directly from F and the calibration matrices.
E = K2TFK1
3. Internal Constraint upon E:
E should obey certain constraints. In particular, if decomposed by SVD into USV.t then it's singular values should be = a, a, 0. The first two diagonal elements of S should be equal, and the third zero.
I was surprised to read here that if this is not true when you test for it, you might choose to fabricate a new Essential matrix from the prior decomposition like so: E_new = U * diag(1,1,0) * V.t which is of course guaranteed to obey the constraint. You have essentially set S = (100,010,000) artificially.
4. Full Camera Projection Matrices:
There are two camera projection matrices P1 and P2. These are 3x4 and obey the x = PX relation. Also, P = K[R|t] and therefore K_inv.P = [R|t] (where the camera calibration has been removed).
The first matrix P1 (excluding the calibration matrix K) can be set to [I|0] then P2 (excluding K) is R|t
Compute the Rotation and translation between the two cameras R, t from the decomposition of E. There are two possible ways to calculate R (U*W*V.t and U*W.t*V.t) and two ways to calculate t (±third column of U), which means that there are four combinations of Rt, only one of which is valid.
Compute all four combinations, and choose the one that geometrically corresponds to the situation where a reconstructed point is in front of both cameras. I actually do this by carrying through and calculating the resulting P2 = [R|t] and triangulating the 3d position of a few correspondences in normalised coordinates to ensure that they have a positive depth (z-coord)
5. Triangulate in 3D
Finally, combine the recovered 3x4 projection matrices with their respective calibration matrices: P'1 = K1P1 and P'2 = K2P2
And triangulate the 3-space coordinates of each 2d point correspondence accordingly, for which I am using the LinearLS method from here.
QUESTIONS:
Are there any howling omissions and/or errors in this method?
My F matrix is apparently accurate (0.22% deflection in the mapping compared to typical coordinate values), but when testing E against x'Ex = 0 using normalised image correspondences the typical error in that mapping is >100% of the normalised coordinates themselves. Is testing E against xEx = 0 valid, and if so where is that jump in error coming from?
The error in my fundamental matrix estimation is significantly worse when using RANSAC than the 8pt algorithm, ±50px in the mapping between x and x'. This deeply concerns me.
'Enforcing the internal constraint' still sits very weirdly with me - how can it be valid to just manufacture a new Essential matrix from part of the decomposition of the original?
Is there a more efficient way of determining which combo of R and t to use than calculating P and triangulating some of the normalised coordinates?
My final re-projection error is hundreds of pixels in 720p images. Am I likely looking at problems in the calibration, determination of P-matrices or the triangulation?
The error in my fundamental matr1ix estimation is significantly worse
when using RANSAC than the 8pt algorithm, ±50px in the mapping between
x and x'. This deeply concerns me.
Using the 8pt algorithm does not exclude using the RANSAC principle.
When using the 8pt algorithm directly which points do you use? You have to choose 8 (good) points by yourself.
In theory you can compute a fundamental matrix from any point correspondences and you often get a degenerated fundamental matrix because the linear equations are not independend. Another point is that the 8pt algorithm uses a overdetermined system of linear equations so that one single outlier will destroy the fundamental matrix.
Have you tried to use the RANSAC result? I bet it represents one of the correct solutions for F.
My F matrix is apparently accurate (0.22% deflection in the mapping
compared to typical coordinate values), but when testing E against
x'Ex = 0 using normalised image correspondences the typical error in
that mapping is >100% of the normalised coordinates themselves. Is
testing E against xEx = 0 valid, and if so where is that jump in error
coming from?
Again, if F is degenerated, x'Fx = 0 can be for every point correspondence.
Another reason for you incorrect E may be the switch of the cameras (K1T * E * K2 instead of K2T * E * K1). Remember to check: x'Ex = 0
'Enforcing the internal constraint' still sits very weirdly with me -
how can it be valid to just manufacture a new Essential matrix from
part of the decomposition of the original?
It is explained in 'Multiple View Geometry in Computer Vision' from Hartley and Zisserman. As far as I know it has to do with the minimization of the Frobenius norm of F.
You can Google it and there are pdf resources.
Is there a more efficient way of determining which combo of R and t to
use than calculating P and triangulating some of the normalised
coordinates?
No as far as I know.
My final re-projection error is hundreds of pixels in 720p images. Am
I likely looking at problems in the calibration, determination of
P-matrices or the triangulation?
Your rigid body transformation P2 is incorrect because E is incorrect.

Understanding the output of solvepnp?

I am have been using solvepnp() for the calculation of the rotation and translation matrix. But the euler angles calculated from the obtained rotation matrix gave very erratic values. Trying to find the problem, I had a set of 2D projection points for my marker and kept the other parameters of solvepnp() constant.
Eg values:
2D points
[219.67473, 242.78395; 363.4151, 238.61298; 503.04855, 234.56117; 501.70917, 628.16742; 500.58069, 959.78564; 383.1756, 972.02679; 262.8746, 984.56982; 243.17044, 646.22925]
The euler angle theta(x) calculated from the output rotation matrix of solvepnp() was -26.4877
Next, I incremented only the x value of the first point(i.e 219.67473) by 0.1 to check the variation of the theta(x) euler angle (keeping the remaining points and the other parameters constant) and ran the solvepnp() again .For that very small change,I had values which were decreasing from -19 degree, -18 degree (for x coord = 223.074) then suddenly jump to 27 degree for a while (for x coord = 223.174 to 226.974) then come down to 1.3 degree (for x coord = 227.074).
I cannot understand this behaviour at all.Could somebody please explain?
My euler angle calculation from the rotation matrix uses this procedure.
Try Rodrigues() for conversion between rotation matrix and rotation vector to make sure everything is clean and right. Non RANSAC version can be very sensitive to outliers that create a huge error in the parameters and thus bias a solution. Using RANSAC version of solvePnP may make it more stable to outliers. For example, adding too much to one of the points coordinates will eventually make it an outlier and it won’t influence a solution after that.
If everything fails, write a series unit tests: create an artificial set of points in 3D (possibly non planar), apply a simple translation first, in second variant apply rotation only, and in a third test apply both. Project using your camera matrix and then plug in your 2D, 3D points and projection matrix into your code to find the pose. If the result deviates from the inverse of the translations and rotations your applied to the points look for the bug in feeding parameters to PnP.
It seems the coordinate systems are different.OpenCV uses right-hand coordinate-system Y-pointing downwards. At nghiaho.com it says the calculations are based on this and if you look at the axis they don't seem to match. I guess you are using Rodrigues for matrix computation? Try comparing rotation vectors as well.

Resources