How can I estimate the probability of a partial state from a Kalman filter? - opencv

I have a Kalman filter tracking a point, with a state vector (x, y, dx/dt, dy/dt).
At a given update, I have a set of candidate points which may correspond to the tracked points. I would like to iterate through these candidates and choose the one most likely to correspond to the tracked point, but only if the probability of that point corresponding to the tracked point is greater than a threshold (e.g. p > 0.5).
Therefore I need to use the covariance and state matrices of the filter to estimate this probability. How can I do this?
Additionally, note that my state vector is four dimensions, but the measurements are in two dimensions (x, y).

When you predict the measurements with y = Hx you also compute the covariance of y as H*P*H.T. This property is why we use variance in the Kalman Filter.
The geometrical way to understand how far a given point is from your predicted point is a error ellipse or confidence region. A 95% confidence region is the ellipse scaled to 2*sigma (if that isn't intuitive, you should go read about normal distributions, because that is what the KF thinks it is working on). If the covariance is diagonal, the error ellipse will be axis aligned. If there are co-varying terms (which there may not be if you have not introduced them anywhere via Q or R) then the ellipse will be tilted.
The mathematical way is with the Mahalanobis distance, which just directly formulates the geometrical representation above as a distance. The distance scale is standard deviations, so your P=0.5 corresponds to a distance of 0.67 (again, see normal distributions if this is surprising).

The most probable point (I suppose from detections) will be the nearest point to filter prediction.

Related

Why do we normalize homography or fundamental matrix?

I want to know about why do we normalize the homography or fundamental matrix? Here is the code in particular.
H = H * (1.0 / H[2, 2]) # Normalization step. H is [3, 3] matrix.
I can understand that we have to normalize the data before computing SVD because of instability caused by linear least squares but why do we normalize it in end?
A homography in 3D space has 8 degrees of freedom by definition, mapping from one plane to another using perspective. Such a homography can be defined by giving four points, which makes eight coordinates (scalars).
A 3x3 matrix has 9 elements, so it has 9 degrees of freedom. That is one degree more than needed for a homography.
The homography doesn't change when the matrix is scaled (multiplied by a scalar). All the math works the same. You don't need to normalize your homography matrix.
It is a good idea to normalize.
For one, it makes the arithmetic somewhat tamer. Have some wikipedia links to fields of study because weaving all these into a coherent sentence... doesn't add anything:
Numerical analysis, Condition number, Floating-point arithmetic, Numerical error, Numerical stability, ...
Also, normalization makes the matrix easier for humans to interpret. The most common normalization is to scale the matrix such that the last element becomes 1. That is convenient because this whole math happens in a projective space, where the projection causes points to be mapped to the w=1 plane, making vectors have a 1 for the last element.
How is the homography matrix provided to you?
For example, in the scene that some library function calculates and provides the homography matrix to you,
if the function specification doesn't mention about the scale...
In an extreme case, the function can be implemented as:
Matrix3x3 CalculateHomographyMatrix( some arguments )
{
Matrix3x3 H = ...; //Homogoraphy Calculation
return Non_Zero_Random_Value * H; //Wow!
}
Element values may become very large or very small and using such values to your process may cause problems (floating point computation errors).

K-means++ clustering Algorithm

The algorithm for the K-means++ is:
Take one centroid c(i), chosen uniformly at random from the dataset.
Take a new Centroid c(i), choosing an instance x(i) from the dataset with the probability
D(X(i))^2/Sum(D(X(j))^2) from j=1 to m, where D(X(i)) is the distance between the instance and the closest centroid which is selected.
What is this parameter m used in the summation of the probability?
It might have been helpful to see the original formulation, but the algorithm is quite clear: during the initialization phase, for each point not used as a centroid, calculate the distance between said point and the nearest centroid, that will be the distance D(X[i]), the pick a random point in this set of points with probability weighted with D(X[i])^2
In your formulation it seems you got m points unused.

Reconstruct image from eigenvectors obtained from solving the eigenfunction of Hamiltonian operator in matrix form

I have an Image I
I am trying to do Automatic Object Extraction using Quantum Mechanics
Each pixel in an image is considered as a potential field, V(x,y) and hence each wave (eigen) function represents a meaningful region.
2D Time-independent Sschrodinger's equation
Multiplying both sides by
We get,
Rewriting the Laplacian using Finite Difference approach
where Ni is the set of neighbours with index i, and |Ni| is the cardinality of, i.e. the number of elements in Ni
Combining the above two equations, we get:
where M is the number of elements in
Now,the left hand side of the equation is a measure of how similar the labels in a neighbourhood are, i.e. a measure of spatial coherence.
Now, for applying this to images, the potential V is given as the pixel intensities.
Here, V is the pixel intensities
The right hand side is a measure of how close the pixel values in a segment are to a constant value E.
Now, the wave functions can be numerically calculated by solving the eigenvectors of Hamiltonian operator in matrix form which is
for i = j
for
and elsewhere 0
Now, in this paper it is said that first we have to find the maximum and minimum eigenvalues and then calculate the eigenvectors with eigenvalues closest to a number of values regularly selected between the minimum and maximum eigenvalues. the number is 300.
I have calculated the 300 eigenvectors.
And then the absolute square of the eigenvectors are thresholded to obtain the segments.
Fine upto this part.
Now, how do I reconstruct the eigenvectors into a 2D image so as to get the potential segments in the image?

Is my understanding right about "Intensity" of Lukas-Kanade method of optical flow?

From 'http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_lucas_kanade.html', it says that intensities are gradients. f(x), f(y) are gradients and f(t) is similarly gradient along time. My confusion starts now. In above link, f(x) and f(y) are also in derivative form but we cannot calculate derivative along x and y because we don't know where the same point goes to, actually that's what we are going to find in this method. So I wonder what it says is that, since it says f(t) is gradient of one point along time so, can I assume f(t) like average of gradient of a point that is collected by several certain period and f(x) and f(y) are collected every period that average of f(t) gradient is collected?
For example, if f(t) is calculated every 20ms and trying to calculate average in every 100ms. In every 100ms f(x) and f(y) is calculated. Is my understanding right?
If wrong then, what is difference between f(x), f(y) and f(t)?
They assume that the intensity of each pixel does not change over time. Based on this assumption, you can calculate the optical flow formula and get the value of U and V, which is the vector of movement.
Let I(x,y) be an intensity value at a pixel position (x,y), than as in the description fx(x,y) is the derivative in x direction and fy(x,y) in the y direction and ft(x,y) in the temporal direction at the position (x,y).
However, in the discret world in our programm, we could only approximate these derivative we also call this gradient. Therfore e.g. the Sobel filters could be used. Or in a more easy way the differenz operator i.e.
fx(x,y) = I(x,y) - I(x-1,y)
fy(x,y) = I(x,y) - I(x,y-1)
similar is that for the temporal gradient but now with the image intensity of the previous image A(x,y)
ft(x,y) = I(x,y) - A(x,y)
idealy and mostly in practice A is one frame befor I, but in the theory A could be taken from more previous frames but this will make ft less accurate.

3D reconstruction from two calibrated cameras - where is the error in this pipeline?

There are many posts about 3D reconstruction from stereo views of known internal calibration, some of which are excellent. I have read a lot of them, and based on what I have read I am trying to compute my own 3D scene reconstruction with the below pipeline / algorithm. I'll set out the method then ask specific questions at the bottom.
0. Calibrate your cameras:
This means retrieve the camera calibration matrices K1 and K2 for Camera 1 and Camera 2. These are 3x3 matrices encapsulating each camera's internal parameters: focal length, principal point offset / image centre. These don't change, you should only need to do this once, well, for each camera as long as you don't zoom or change the resolution you record in.
Do this offline. Do not argue.
I'm using OpenCV's CalibrateCamera() and checkerboard routines, but this functionality is also included in the Matlab Camera Calibration toolbox. The OpenCV routines seem to work nicely.
1. Fundamental Matrix F:
With your cameras now set up as a stereo rig. Determine the fundamental matrix (3x3) of that configuration using point correspondences between the two images/views.
How you obtain the correspondences is up to you and will depend a lot on the scene itself.
I am using OpenCV's findFundamentalMat() to get F, which provides a number of options method wise (8-point algorithm, RANSAC, LMEDS).
You can test the resulting matrix by plugging it into the defining equation of the Fundamental matrix: x'Fx = 0 where x' and x are the raw image point correspondences (x, y) in homogeneous coordinates (x, y, 1) and one of the three-vectors is transposed so that the multiplication makes sense. The nearer to zero for each correspondence, the better F is obeying it's relation. This is equivalent to checking how well the derived F actually maps from one image plane to another. I get an average deflection of ~2px using the 8-point algorithm.
2. Essential Matrix E:
Compute the Essential matrix directly from F and the calibration matrices.
E = K2TFK1
3. Internal Constraint upon E:
E should obey certain constraints. In particular, if decomposed by SVD into USV.t then it's singular values should be = a, a, 0. The first two diagonal elements of S should be equal, and the third zero.
I was surprised to read here that if this is not true when you test for it, you might choose to fabricate a new Essential matrix from the prior decomposition like so: E_new = U * diag(1,1,0) * V.t which is of course guaranteed to obey the constraint. You have essentially set S = (100,010,000) artificially.
4. Full Camera Projection Matrices:
There are two camera projection matrices P1 and P2. These are 3x4 and obey the x = PX relation. Also, P = K[R|t] and therefore K_inv.P = [R|t] (where the camera calibration has been removed).
The first matrix P1 (excluding the calibration matrix K) can be set to [I|0] then P2 (excluding K) is R|t
Compute the Rotation and translation between the two cameras R, t from the decomposition of E. There are two possible ways to calculate R (U*W*V.t and U*W.t*V.t) and two ways to calculate t (±third column of U), which means that there are four combinations of Rt, only one of which is valid.
Compute all four combinations, and choose the one that geometrically corresponds to the situation where a reconstructed point is in front of both cameras. I actually do this by carrying through and calculating the resulting P2 = [R|t] and triangulating the 3d position of a few correspondences in normalised coordinates to ensure that they have a positive depth (z-coord)
5. Triangulate in 3D
Finally, combine the recovered 3x4 projection matrices with their respective calibration matrices: P'1 = K1P1 and P'2 = K2P2
And triangulate the 3-space coordinates of each 2d point correspondence accordingly, for which I am using the LinearLS method from here.
QUESTIONS:
Are there any howling omissions and/or errors in this method?
My F matrix is apparently accurate (0.22% deflection in the mapping compared to typical coordinate values), but when testing E against x'Ex = 0 using normalised image correspondences the typical error in that mapping is >100% of the normalised coordinates themselves. Is testing E against xEx = 0 valid, and if so where is that jump in error coming from?
The error in my fundamental matrix estimation is significantly worse when using RANSAC than the 8pt algorithm, ±50px in the mapping between x and x'. This deeply concerns me.
'Enforcing the internal constraint' still sits very weirdly with me - how can it be valid to just manufacture a new Essential matrix from part of the decomposition of the original?
Is there a more efficient way of determining which combo of R and t to use than calculating P and triangulating some of the normalised coordinates?
My final re-projection error is hundreds of pixels in 720p images. Am I likely looking at problems in the calibration, determination of P-matrices or the triangulation?
The error in my fundamental matr1ix estimation is significantly worse
when using RANSAC than the 8pt algorithm, ±50px in the mapping between
x and x'. This deeply concerns me.
Using the 8pt algorithm does not exclude using the RANSAC principle.
When using the 8pt algorithm directly which points do you use? You have to choose 8 (good) points by yourself.
In theory you can compute a fundamental matrix from any point correspondences and you often get a degenerated fundamental matrix because the linear equations are not independend. Another point is that the 8pt algorithm uses a overdetermined system of linear equations so that one single outlier will destroy the fundamental matrix.
Have you tried to use the RANSAC result? I bet it represents one of the correct solutions for F.
My F matrix is apparently accurate (0.22% deflection in the mapping
compared to typical coordinate values), but when testing E against
x'Ex = 0 using normalised image correspondences the typical error in
that mapping is >100% of the normalised coordinates themselves. Is
testing E against xEx = 0 valid, and if so where is that jump in error
coming from?
Again, if F is degenerated, x'Fx = 0 can be for every point correspondence.
Another reason for you incorrect E may be the switch of the cameras (K1T * E * K2 instead of K2T * E * K1). Remember to check: x'Ex = 0
'Enforcing the internal constraint' still sits very weirdly with me -
how can it be valid to just manufacture a new Essential matrix from
part of the decomposition of the original?
It is explained in 'Multiple View Geometry in Computer Vision' from Hartley and Zisserman. As far as I know it has to do with the minimization of the Frobenius norm of F.
You can Google it and there are pdf resources.
Is there a more efficient way of determining which combo of R and t to
use than calculating P and triangulating some of the normalised
coordinates?
No as far as I know.
My final re-projection error is hundreds of pixels in 720p images. Am
I likely looking at problems in the calibration, determination of
P-matrices or the triangulation?
Your rigid body transformation P2 is incorrect because E is incorrect.

Resources