Composing multiple Essential matrices - opencv

I have computed the Essential matrices between Frame [a,b], [b,c] and [c,d]. I have now E_ab, E_bc and E_cd. Is it possible to compute E_ad directly without matching? I am thinking of sth like 3D transformation composing where:
T04 = T01 * T12 * T23 * T34
The solution that I found is to decompose each one it into R|T and construct T_4*4 from R|T and then use the previouse notation of composing T matrices and then convert the final R|T to E. However, I was wondering if there is a direct way without decomposing for two reasons:
Decomposing and re composing (at least in OpenCV) changes the essential matrix a bit
Performance sake
So, How can I combine two (or more) essential matrices directly?

Before answering the original question, I think it might be important to discuss this point from your comment:
Ultimate goal is Bundle Adjustment on the Essential Matrices directly instead of on RT. I am using Single Calibrated Camera
It might be that I'm missing critical details about your project, but there seems to be a few issues with that method:
Minimal representation. Usually, it's better (for both performance and accuracy reasons) to use minimal parametrisations in your estimations. The Essential matrix has six degrees of freedom (which can be seen from its decomposition as [T]_xR). If you estimate your Essential directly, you're optimizing eight parameters instead of six.
Less constrained problem. The Essential matrix imposes coplanarity constraints and doesn't penalize some wrong configurations that would be penalized by reprojection errors that are used in usual bundle adjustment. If you consider two cameras C_1 and C_2 such that a point X_2 expressed in C_2 can be expressed in C_1 via RX_2+T, and that we denote the corresponding point in C_1 as X_1, then the constraint X_1.transpose()*E*X_2=0 imposes that the three vectors T, X_1 and RX_2+T remain coplanar. So, the point X_2 is free to move as long as it reprojects to the epiploar line, even if its ray diverges from X_1:
As you can see in the figure, the red and green points would respect the Essential matrix constraint, however they would have different reprojection errors.
Time complexity. Computing Essentials is really slow compared to alternatives, and so is the recovery of rotation and translations from Essentials, compared to the inverse problem.
Now to come back to the initial problem from the question, if you consider the three cameras C_1, C_2, C_3 and respectively denote their poses as Identity, R_1, R_2, and note the Essentials between C_1 and C_2 as E_1 and the one between C_2 and C_3 as E_2, then you can easily see that the Essential from C_3 to C_1, that I'll note E_3, will be given by
E_3=[R_1T_2+T_1]_x R_1R_2
which with a little bit of reorganization leads to
E_3=R_1E_2 + E_1R_2
This relation which establishes the link between the three Essential matrices seems to indicate that you still will need to extract R_1 and R_2 from E_1 and E_2.
Edit: I derived the relationship above like this: assume y is a 3x1 vector, then
E_3y = [R_1T_2+T_1]_x R_1R_2y
= (R_1T_2+T_1) x (R_1R_2y)
= R_1T_2 x R_1R_2y + T_1 x R_1R_2y #for the next line, note that crossproduct is invariant under rotation
= R_1(T_2 x R_2 y) + T_1 x R_1R_2 y
= R_1[T_2]_x R_2 y + [T_1]x_x R_1 R_2 y
= R_1 E_2 y + E_1 R_2 y
= (R_1E_2 + E_1R_2) y
Since this holds for any arbitrary y, this means that E_3=R_1E_2+E_1R_2.

Related

How to align two Point Clouds given the set of points and point-to-point correspondence?

Suppose I have two pointclouds [x1, x2, x3...] and [y1, y2, y3...]. These two pointclouds should be as close as possible. There are a lot of algorithms and deep learning techniques for the pointcloud registration problems. But I have the extra information that: points x1 and y1 should be aligned, x2 and y2 should be aligned, and so on.
So the order of the points in both point clouds is the same. How can I use this to properly get the transformation matrix to align these two-point clouds?
Note: These two points clouds are not exactly the same. Actually, I had ground truth point cloud [x1,x2,x3...] and I tried to reconstruct another point cloud as [y1,y2,y3...]. Now I want to match them and visualize them if the reconstruction is good or not.
The problem you are facing is an overdetermined system of equations, which is solvable with a closed-form expression. No need for iterative methods like ICP, since you have the correspondence between points.
If you're looking for a rigid transform (that allows scaling, rotation and translation but no shearing), you want Umeyama's algorithm [3], which is a closed form as well, there is a Python implementation here: https://gist.github.com/nh2/bc4e2981b0e213fefd4aaa33edfb3893
If you are looking for an affine transform between your point clouds, i.e a linear transform A (that allows shearing, see [2]) as well as a translation t (which is not linear):
Then, each of your points must satisfy the equation:
y = Ax + t.
Here we assume the following shapes: y:(d,n), A:(d,d), x:(d,n), t:(d,1) if each cloud has n points in R^d.
You can also write it in homogeneous notation, by adding an extra coordinate, see [1]. This results in a linear system y=Mx, and a lot (assuming n>d) of pairs (x,y) that satisfy this equation (i.e. an overdetermined system).
You can therefore solve this using a closed-form least square method:
# Inputs:
# - P, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# - Q, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# P and Q must be of the same shape.
# This function returns :
# - Pt, the P point cloud, transformed to fit to Q
# - (T,t) the affine transform
def affine_registration(P, Q):
transposed = False
if P.shape[0] < P.shape[1]:
transposed = True
P = P.T
Q = Q.T
(n, dim) = P.shape
# Compute least squares
p, res, rnk, s = scipy.linalg.lstsq(np.hstack((P, np.ones([n, 1]))), Q)
# Get translation
t = p[-1].T
# Get transform matrix
T = p[:-1].T
# Compute transformed pointcloud
Pt = P#T.T + t
if transposed: Pt = Pt.T
return Pt, (T, t)
Opencv has a function called getAffineTransform(), however it only takes 3 pairs of points as input. https://theailearner.com/tag/cv2-getaffinetransform/. This won't be robust for your case (if e.g. you give it the first 3 pairs of points).
References:
[1] https://web.cse.ohio-state.edu/~shen.94/681/Site/Slides_files/transformation_review.pdf#page=24
[2] https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html
[3] https://stackoverflow.com/a/32244818/4195725
As another user already mentioned, the ICP algorithm (implementation in PCL can be found here) can be used to register two point clouds to each other. However this only works locally, so the clouds have to be aligned first.
I don't think there is a global registration in PCL at the moment, but I've used OpenGR which has a PCL wrapper.
If you know for sure that x1 is near y1, x2 is near y2 etc. you can do a manual alignment which will be a lot faster than global alignment:
Translate 2nd cloud by vector y1-x1
Rotate vector y2-y1 into vector x2-x1
Then refine it using ICP.
This does not account for measurement errors, so using the matrix estimation above will be useful if your data is not 100% correct.
VTK's vtkLandmarkTransform also does the same thing, with support for RigidBody/Similarity/Affine transformation:
// need at least four pairs of points in sourcePoint and targetPoints,
// can pick more, but probably not too many
vtkLandmarkTransform landmarkTransform = new vtkLandmarkTransform();
landmarkTransform.SetSourceLandmarks(sourcePoints); // source is to be transformed to match the target
landmarkTransform.SetTargetLandmarks(targetPoints); // target stays still
landmarkTransform.SetMode(VTK_Modes.AFFINE);
landmarkTransform.Modified(); // do the calculation
landmarkTransform.GetMatrix(mtx);
// now you can apply the mtx to all points

Trying to do PCA analysis on interest rate swaps data (multivariate time series)

I have a data set with 20 non-overlapping different swap rates (spot1y, 1y1y, 2y1y, 3y1y, 4y1y, 5y2y, 7y3y, 10y2y, 12y3y...) over the past year.
I want to use PCA / multiregression and look at residuals in order to determine which sectors on the curve are cheap/rich. Has anyone had experience with this? I've done PCA but not for time series. I'd ideally like to model something similar to the first figure here but in USD.
https://plus.credit-suisse.com/rpc4/ravDocView?docid=kv66a7
Thanks!
Here are some broad strokes that can help answer your question. Also, that's a neat analysis from CS :)
Let's be pythonistas and use NumPy. You can imagine your dataset as a 20x261 array of floats. The first place to start is creating the array. Suppose you have a CSV file storing the raw data persistently. Then a reasonable first step to load the data would be something as simple as:
import numpy
x = numpy.loadtxt("path/to/my/file")
The object x is our raw time series matrix, and we verify the truthness of x.shape == (20, 261). The next step is to transform this array into it's covariance matrix. Whether it has been done on the raw data already, or it still has to be done, the first step is centering each time series on it's mean, like this:
x_centered = x - x.mean(axis=1, keepdims=True)
The purpose of this step is to help simplify any necessary rescaling, and is a very good habit that usually shouldn't be skipped. The call to x.mean uses the parameters axis and keepdims to make sure each row (e.g. the time series for spot1yr, ...) is centered with it's mean value.
The next steps are to square and scale x to produce a swap rate covariance array. With 2-dimensional arrays like x, there are two ways to square it-- one that leads to a 261x261 array and another that leads to a 20x20 array. It's the second array we are interested in, and the squaring procedure that will work for our purposes is:
x_centered_squared = numpy.matmul(x_centered, x_centered.transpose())
Then, to scale one can chose between 1/261 or 1/(261-1) depending on the statistical context, which looks like this:
x_covariance = x_centered_squared * (1/261)
The array x_covariance has an entry for how each swap rate changes with itself, and changes with any one of the other swap rates. In linear-algebraic terms, it is a symmetric operator that characterizes the spread of each swap rate.
Linear algebra also tells us that this array can be decomposed into it's associated eigen-spectrum, with elements in this spectrum being scalar-vector pairs, or eigenvalue-eigenvector pairs. In the analysis you shared, x_covariance's eigenvalues are plotted in exhibit two as percent variance explained. To produce the data for a plot like exhibit two (which you will always want to furnish to the readers of your PCA), you simply divide each eigenvalue by the sum of all of them, then multiply each by 100.0. Due to the convenient properties of x_covariance, a suitable way to compute it's spectrum is like this:
vals, vects = numpy.linalg.eig(x_covariance)
We are now in a position to talk about residuals! Here is their definition (with our namespace): residuals_ij = x_ij − reconstructed_ij; i = 1:20; j = 1:261. Thus for every datum in x, there is a corresponding residual, and to find them, we need to recover the reconstructed_ij array. We can do this column-by-column, operating on each x_i with a change of basis operator to produce each reconstructed_i, each of which can be viewed as coordinates in a proper subspace of the original or raw basis. The analysis describes a modified Gram-Schmidt approach to compute the change of basis operator we need, which ensures this proper subspace's basis is an orthogonal set.
What we are going to do in the approach is take the eigenvectors corresponding to the three largest eigenvalues, and transform them into three mutually orthogonal vectors, x, y, z. Research the web for active discussions and questions geared toward developing the Gram-Schmidt process for all sorts of practical applications, but for simplicity let's follow the analysis by hand:
x = vects[0] - sum([])
xx = numpy.dot(x, x)
y = vects[1] - sum(
(numpy.dot(x, vects[1]) / xx) * x
)
yy = numpy.dot(y, y)
z = vects[2] - sum(
(numpy.dot(x, vects[2]) / xx) * x,
(numpy.dot(y, vects[2]) / yy) * y
)
It's reasonable to implement normalization before or after this step, which should be informed by the data of course.
Now with the raw data, we implicitly made the assumption that the basis is standard, we need a map between {e1, e2, ..., e20} and {x,y,z}, which is given by
ch_of_basis = numpy.array([x,y,z]).transpose()
This can be used to compute each reconstructed_i, like this:
reconstructed = []
for measurement in x.transpose().tolist():
reconstructed.append(numpy.dot(ch_of_basis, measurement))
reconstructed = numpy.array(reconstructed).transpose()
And then you get the residuals by subtraction:
residuals = x - reconstructed
This flow obviously might need further tuning, but it's the gist of how to do compute all the residuals. To get that periodic bar plot, take the average of each row in residuals.

Trying to understand expected value in Linear Regression

I'm having trouble understanding a lecture slide in my school's machine learning course
why does the expected value of Y = f(X)? what does it mean
my understanding is that X, Y are vectors and f(X) outputs a vector of Y where each individual value (y_i) in the Y vector corresponds to a f(x_i) where x_i is the value in X at index i; But now it's taking the expected value of Y, which is going to be a single value, so how is that equal to f(X)?
X, Y (uppercase) are vectors
x_i,y_i (lowercase with subscript) are scalars at index i in X,Y
There is a lot of confusion here. First let's start with definitions
Definitions
Expectation operator E[.]: Takes a random variable as an input and gives a scalar/vector as an output. Let's say Y is a normally distributed random variable with mean Mu and Variance Sigma^{2} (usually stated as:
Y ~ N( Mu , Sigma^{2} ), then E[Y] = Mu
Function f(.): Takes a scalar/vector (not a random variable) and gives a scalar/vector. In this context it is an affine function, that is f(X) = a*X + b where a and b are fixed constants.
What's Going On
Now you can view linear regression from two angles.
Stats View
One angle assumes that your response variable-Y- is a normally distributed random variable because:
Y ~ a*X + b + epsilon
where
epsilon ~ N( 0 , sigma^sq )
and X is some other distribution. We don't really care how X is distributed and treat it as given. In that case the conditional distribution is
Y|X ~ N( a*X + b , sigma^sq )
Notice here that a,b and also X is a number, there is no randomness associated with them.
Maths View
The other view is the math view where I assume that there is a function f(.) that governs the real life process, that if in real life I observe X, then f(X) should be the output. Of course this is not the case and the deviations are assumed to be due to various reasons such as gauge error etc. The claim is that this function is linear:
f(X) = a*X + b
Synthesis
Now how do we combine these? Well, as follows:
E[Y|X] = a*X + b = f(X)
About your question, I first would like to challenge that it should be Y|X and not Y by itself.
Second, there are tons of possible ontological discussions over what each term here represents in real life. X,Y (uppercase) could be vectors. X,Y (uppercase) could also be random variables. A sample of these random variables might be stored in vectors and both would be represented with uppercase letters (the best way is to use different fonts for each). In this case, your sample will become your data. Discussions about the general view of the model and its relevance to real life should be made at random variable level. The way to infer the parameters, how linear regression algorithms works should be made at matrix and vectors levels. There could be other discussion where you should care about both.
I hope this overly unorganized answer helps you. In general if you want to learn such stuff, be sure you know what kind of math objects and operators you are dealing with , what do they take as input and what are their relevance to real life.

Why to add mean in the reconstruction in PCA?

Suppose that X is our dataset (still not centered) and X_cent is our centered dataset (X_cent = X - mean(X)).
If we are doing PCA projection in this way Z_cent = F*X_cent, where F is matrix of principal components, that is pretty obvious that we need to add mean(X) after reconstruction Z_cent.
But what if we are doing PCA projection in this way Z = F*X? In this case we don't need to add mean(X) after reconstruction, but it gives us another result.
I think that something wrong with this procedure (construction-reconstruction), when it applied to the non-centered data (X in our case). Can anyone explain how it works? Why can't we do construction/reconstruction phase without this subracting/adding mean?
Thank you in advance.
If you retain all Principal Components, then reconstruction of the centered and non-centered vectors as described in your question would be identical. The problem (as indicated in your comments) is that you are only retaining K principal components. When you drop PCs, you lose information so the reconstruction will contain errors. Since you don't have to reconstruct the mean in one of the reconstructions, you don't introduce errors w.r.t. the mean there so the reconstruction errors of the two versions will be different.
Reconstruction with fewer than all PCs isn't quite as simple as multiplying by the transpose of the eigenvectors (F') because you need to pad your transformed data with zeros but to keep things simple, I'll ignore that here. Your two reconstructions look like this:
R1 = F'*F*X
R2 = F'*F*X_cent + X_mean
= F'*F*(X - X_mean) + X_mean
= F'*F*X - F'*F*X_mean + X_mean
Since the reconstruction is lossy, in general, F'*F*Y != Y for matrix Y. If you retrained all PCs, you would have R1 - R2 = 0. But since you are only retaining a subset of the PCs, your two reconstructions will differ by
R2 - R1 = X_mean - F'*F*X_mean
Your follow-up question in the comments regarding why it's better to reconstruct X_cent instead of X is a bit more nuanced and really depends on why you are doing PCA in the first place. The most fundamental reason is that the PCs are with respect to the mean in the first place so by not centering the data prior to transforming/rotating, you aren't really decorrelating the features. Another reason is that the numeric values of the transformed data are often orders of magnitude smaller when centering the data first.

Intuition about the kernel trick in machine learning

I have successfully implemented a kernel perceptron classifier, that uses an RBF kernel. I understand that the kernel trick maps features to a higher dimension so that a linear hyperplane can be constructed to separate the points. For example, if you have features (x1,x2) and map it to a 3-dimensional feature space you might get: K(x1,x2) = (x1^2, sqrt(x1)*x2, x2^2).
If you plug that into the perceptron decision function w'x+b = 0, you end up with: w1'x1^2 + w2'sqrt(x1)*x2 + w3'x2^2which gives you a circular decision boundary.
While the kernel trick itself is very intuitive, I am not able to understand the linear algebra aspect of this. Can someone help me understand how we are able to map all of these additional features without explicitly specifying them, using just the inner product?
Thanks!
Simple.
Give me the numeric result of (x+y)^10 for some values of x and y.
What would you rather do, "cheat" and sum x+y and then take that value to the 10'th power, or expand out the exact results writing out
x^10+10 x^9 y+45 x^8 y^2+120 x^7 y^3+210 x^6 y^4+252 x^5 y^5+210 x^4 y^6+120 x^3 y^7+45 x^2 y^8+10 x y^9+y^10
And then compute each term and then add them together? Clearly we can evaluate the dot product between degree 10 polynomials without explicitly forming them.
Valid kernels are dot products where we can "cheat" and compute the numeric result between two points without having to form their explicit feature values. There are many such possible kernels, though only a few have been getting used a lot on papers / practice.
I'm not sure if I'm answering your question, but as I remember the "trick" is that you don't explicitly calculate inner products. The perceptron calculates a straight line that separates the clusters. To get curved lines or even circles, instead of changing the perceptron you can change the space that contains the clusters. This is done by using a transformation usually called phi that transform coordinates to from one space to another. The perceptron algorithm is then applied in the new space where it produces a straight line, but when that line then is transformed back to the original space it can be curved.
The trick is that the perceptron only needs to know the inner product of the points of the clusters it is trying to separate. This means that we only need to be able to calculate the inner product of the transformed points. This is what the kernel does K(x,y) = <phi(x), phi(y)> where < . , . > is the inner product in the new space. This means that there is no need to do all the transformations to the new space and back, we don't even need to explicitly know what the transformation phi() is. All that is needed is that K defines an inner product in some space and hope that this inner product and space is useful for separating our clusters.
I think that there was some theorem that says that if the space represented by the kernel has higher dimensionality than the original space it is likely that it will separate the clusters better.
There is really not much to it
The weight in the higher space is
w = sum_i{a_i^t * Phi(x_i)}
and the input vector in the higher space
Phi(x)
so that the linear classification in the higher space is
w^t * input + c > 0
so if you put these together
sum_i{a_i * Phi(x_i)} * Phi(x) + c = sum_i{a_i * Phi(x_i)^t * Phi(x)} + c > 0
the last dot product's computational complexity is linear to the number of dimensions (often intractable, or not wanted)
We solve this by going over to the kernel "magic answer to the dot product"
K(x_i, x) = Phi(x_i)^t * Phi(x)
which gives
sum_i{a_i * K(x_i, x)} + c > 0

Resources