OpenCV CV findHomography assertion error - counter => 4 - opencv

I'm currently finishing my evaluation-tool for interest point detectors. In the last steps I found a confusing error.
Mat findHomography(InputArray srcPoints, InputArray dstPoints, int method=0, double ransacReprojThreshold=3, OutputArray mask=noArray() )
The srcPoints and dstPoints are vector<Points2f> which stores the corresponding points of the matched keypoints. So far nothing special - It's like in the tutorials.
But when I use RANSAC and have a vector<Points2f> in range [0, ... , 4], I get an assertion error than the counter should be greater or equals four.
Question 1: Does the algorithm needs at least four points to describe what belongs to the current model or not and to create the consensus?
Question 2: Is there any documentation about this? (I took a look at the doc and the tutorials.)
Please note that I already have seen this question. But there is no satisfying answer for the behaviour of RANSAC. Or should I accept that this methods needs at least four points to find the homography?
Thanks for your help.

A homography cannot be computed with less than 4 pairs of points. That is because with only 3 points there is a perspective ambiguity. Picture a triangle
a
b c
in image 1. In image 2 the points have been transformed to look like this
a
b c
The distance between b and c has been cut in half. Unfortunately you don't know if that is because point c got closer to your or farther from you. With a 4th point the difference becomes clear.
a d
b c
Here is a square in image 1
d
a
b c
here d and c rotated towards you
a
d
b c
and here they rotated away from you.
I don't see this requirement in the openCV documentation but if you find any resources on homography calculation you won't have to read very far before you find this requirement and a more rigorous proof of 4 points being sufficient.

RANSAC is used to select 4 pairs of matching points in a greater set or correspondences (i.e. when srcPoints.size() >= 4). That's why you get an error if srcPoints.size() <=4.
You need at least 4 correspondences simply because a Homography matrix H has 8 degrees of freedom, hence 8 linear equations are required to find a solution. Since each pairs of points generates two linear equations (using x and y coordinates) you'll need a total of at least 4 correspondences.

Related

OpenCV: What does it mean when the number of inliers returned by recoverPose() function is 0?

I've been working on a pose estimation project and one of the steps is finding the pose using the recoverPose function of OpenCV.
int cv::recoverPose(InputArray E,
InputArray points1,
InputArray points2,
InputArray cameraMatrix,
OutputArray R,
OutputArray t,
InputOutputArray mask = noArray()
)
I have all the required info: essential matrix E, key points in image 1 points1, corresponding key points in image 2 points2, and the cameraMatrix. However, the one thing that still confuses me a lot is the int value (i.e. the number of inliers) returned by the function. As per the documentation:
Recover relative camera rotation and translation from an estimated essential matrix and the corresponding points in two images, using cheirality check. Returns the number of inliers which pass the check.
However, I don't completely understand that yet. I'm concerned with this because, at some point, the yaw angle (calculated using the output rotation matrix R) suddenly jumps by more than 150 degrees. For that particular frame, the number of inliers is 0. So, as per the documentation, no points passed the cheirality check. But still, what does it mean exactly? Can that be the reason for the sudden jump in yaw angle? If yes, what are my options to avoid that? As the process is iterative, that one sudden jump affects all the further poses!
This function decomposes the Essential matrix E into R and t. However, you can get up to 4 solutions, i. e. pairs of R and t. Of these 4, only one is physically realizable, meaning that the other 3 project the 3D points behind one or both cameras.
The cheirality check is what you use to find that one physically realizable solution, and this is why you need to pass matching points into the function. It will use the matching 2D points to triangulate the corresponding 3D points using each of the 4 R and t pairs, and choose the one for which it gets the most 3D points in front of both cameras. This accounts for the possibility that some of the point matches can be wrong. The number of points that end up in front of both cameras is the number of inliers that the functions returns.
So, if the number of inliers is 0, then something went very wrong. Either your E is wrong, or the point matches are wrong, or both. In this case you simply cannot estimate the camera motion from those two images.
There are several things you can check.
After you call findEssentialMat you get the inliers from the RANSAC used to find E. Make sure that you are passing only those inlier points into recoverPose. You don't want to pass in all the points that you passed into findEssentialMat.
Before you pass E into recoverPose check if it is of rank 2. If it is not, then you can enforce the rank 2 constraint on E. You can take the SVD of E, set the smallest eigenvalue to 0, and then reconstitute E.
After you get R and t from recoverPose, you can check that R is indeed a rotation matrix with the determinate equal to 1. If the determinant is equal to -1, then R is a reflection, and things have gone wrong.

Image point to point matching using intrinsics, extrinsics and third-party depth

I want to reopen a similar question to one which somebody posted a while ago with some major difference.
The previous post is https://stackoverflow.com/questions/52536520/image-matching-using-intrinsic-and-extrinsic-camera-parameters]
and my question is can I do the matching if I do have the depth?
If it is possible can some describe a set of formulas which I have to solve to get the desirable matching ?
Here there is also some correspondence on slide 16/43:
Depth from Stereo Lecture
In what units all the variables here, can some one clarify please ?
Will this formula help me to calculate the desirable point to point correspondence ?
I know the Z (mm, cm, m, whatever unit it is) and the x_l (I guess this is y coordinate of the pixel, so both x_l and x_r are on the same horizontal line, correct if I'm wrong), I'm not sure if T is in mm (or cm, m, i.e distance unit) and f is in pixels/mm (distance unit) or is it something else ?
Thank you in advance.
EDIT:
So as it was said by #fana, the solution is indeed a projection.
For my understanding it is P(v) = K (Rv+t), where R is 3 x 3 rotation matrix (calculated for example from calibration), t is the 3 x 1 translation vector and K is the 3 x 3 intrinsics matrix.
from the following video:
It can be seen that there is translation only in one dimension (because the situation is where the images are parallel so the translation takes place only on X-axis) but in other situation, as much as I understand if the cameras are not on the same parallel line, there is also translation on Y-axis. What is the translation on the Z-axis which I get through the calibration, is it some rescale factor due to different image resolutions for example ? Did I wrote the projection formula correctly in the general case?
I also want to ask about the whole idea.
Suppose I have 3 cameras, one with large FOV which gives me color and depth for each pixel, lets call it the first (3d tensor, color stacked with depth correspondingly), and two with which I want to do stereo, lets call them second and third.
Instead of calibrating the two cameras, my idea is to use the depth from the first camera to calculate the xyz of pixel u,v of its correspondent color frame, that can be done easily and now to project it on the second and the third image using the R,t found by calibration between the first camera and the second and the third, and using the K intrinsics matrices so the projection matrix seems to be full known, am I right ?
Assume for the case that FOV of color is big enough to include all that can be seen from the second and the third cameras.
That way, by projection each x,y,z of the first camera I can know where is the corresponding pixels on the two other cameras, is that correct ?

Image matching using intrinsic and extrinsic camera parameters

Currently I have a dataset of images (sequence of frames) and I have the intrinsic camera calibration matrix. Also, for each image I have the extrinsic parameters (rotation and traslation).
I would like to know if it is possible use that parameters to find the correct pixel correspondences between each pair of images.
I found the relationship traslation (t) and rotation (R) with each correspondence point between two different perspectives.
I guess that using the image above, it is only necessary to fix a "x" point (in homogeneous coordinates) and solve the equation system for "x'", but I do not know what operation is using (notations). If someone know how to do it using matlab, I hope some help.
Also, if there is another way to discover the matching using the same information I hope the help of someone.
Thanks
No, this information is not enough to find point correspondences between the frames. I will first explain what I think that you can do with the given information, and then we'll see why it's impossible to get pixel to pixel matches from the Essential alone.
What you can do. For a point m, you can find the line on the other image where m' lies, by using the Fundamental matrix. Let's assume that the X and X' you give in your question are (respectively) projected to m and m', i.e.
//K denotes the intrinsics matrix
m=KX
m'=KX'
Starting with your equation, we have:
X^{T}EX'=0 ==> m^T K^{-T} E K^{-1} m'
The matrix K^{-T} E K^{-1} , that we will note F, is known as the Fundamental matrix, and now you have a constraint between 2d points in the image plane:
m^TFm'=0
Note that m and m' are 3d vectors expressed in homogeneous coordinates. The interesting thing to notice here, is that Fm' is the line on which m lies on the first image (since the constraint given above is nothing but the dot product between m and Fm'). Similarly, m^TF is the line on the other image in which m' is expected to lie. So, what you can do to find a match for m, is to search in a neighborhood of Fm'.
Why you can't get pixel to pixel matching. Let's look at what the constraint xEx'=0 means from an intuitive point of view. Basically, what it says is that we expect x, x' and the baseline T to be coplanar. Assume that you fix x, and that you look for points that satisfy xEx'=0. Then while the x' in you figure satisfies this constraint, every point n (reprojected from y) such as the one is the figure below will also be a good candidate:
which indicates that the correct match depends on your estimation of the depth of x, which you don't have.

Why does fundamental matrix have 7 degrees of freedom?

There are 9 parameters in the fundamental matrix to relate the pixel co-ordinates of left and right images but only 7 degrees of freedom (DOF).
The reasoning for this on several pages that I've searched says :
Homogenous equations means we lose a degree of freedom
The determinant of F = 0, therefore we lose another degree of freedom.
I don't understand why those 2 reasons mean we lose 2 DOF - can someone explain it?
We initially have 9 DOF because the fundamental matrix is composed of 9 parameters, which implies that we need 9 corresponding points to compute the fundamental matrix (F). But because of the following two reasons, we only need 7 corresponding points.
Reason 1
We lose 1 DOF because we are using homogeneous coordinates. This basically is a way to represent nD points as a vector form by adding an extra dimension. ie) A 2D point (0,2) can be represented as [0,2,1], in general [x,y,1]. There are useful properties when using homogeneous coordinates with 2D/3D transformation, but I'm going to assume you know that.
Now given the expression p and p' representing pixel coordinates:
p'=[u',v',1] and p=[u,v,1]
the fundamental matrix:
F = [f1,f2,f3]
[f4,f5,f6]
[f7,f8,f9]
and fundamental matrix equation:
(transposed p')Fp = 0
when we multiple this expression in algebra form, we get the following:
uu'f1 + vu'f2 + u'f3 + uv'f4 + vv'f5 + v'f6 + uf7 + vf8 + f9 = 0.
In a homogeneous system of linear equation form Af=0 (basically the factorization of the above formula), we get two components A and f.
A:
[uu',vu',u', uv',vv',v',u,v,1]
f (f is essentially the fundamental matrix in vector form):
[f1,f2'f3,f4,f5,f6,f7,f8,f9]
Now if we look at the components of vector A, we have 8 unknowns, but one known value 1 because of homogeneous coordinates, and therefore we only need 8 equations now.
Reason 2
det F = 0.
A determinant is a value that can be obtained from a square matrix.
I'm not entirely sure about the mathematical details of this property but I can still infer the basic idea, and, hopefully, you can as well.
Basically given some matrix A
A = [a,b,c]
[d,e,f]
[g,h,i]
The determinant can be computed using this formula:
det A = aei+bfg+cdh-ceg-bdi-afh
If we look at the determinant using the fundamental matrix, the algebra would look something like this:
F = [f1,f2,f3]
[f4,f5,f6]
[f7,f8,f9]
det F = (f1*f5*f8)+(f2*f6*f7)+(f3*f4*f8)-(f3*f5*f7)-(f2*f4*f9)-(f1*f6*f8)
Now we know the determinant of the fundamental matrix is zero:
det F = (f1*f5*f8)+(f2*f6*f7)+(f3*f4*f8)-(f3*f5*f7)-(f2*f4*f9)-(f1*f6*f8) = 0
So, if we work out only 7 of the 9 parameters of the fundamental matrix, we can work out the last parameter using the above determinant equation.
Therefore the fundamental matrix has 7DOF.
The reasons why F has only 7 degrees of freedom are
F is a 3x3 homogeneous matrix. Homogeneous means there is a scale ambiguity in the matrix, so the scale doesn't matter (as shown in #Curator Corpus 's example). This drops one degree of freedom.
F is a matrix with rank 2. It is not a full rank matrix, so it is singular and its determinant is zero (Proof here). The reason why F is a matrix with rank 2 is that it is mapping a 2D plane (image1) to all the lines (in image 2) that pass through the epipole (of image 2).
Hope it helps.
As for the highest votes answer by nbro, I think it can be interpreted as this way where we have reason two, matrix F has a rank2, so its determinant is zero as a constraint to the f variable function. So, we only need 7 points to determine the rest of variables (f1-f8), with the previous constriant. And 8 equations, 8 variables, leaving only one solution. So there is 7 DOF.

what input x maximize activation function in an autoencoder hidden layer?

Hi when i am reading about Stanford's Machine Learning materials about autoencoder, found a formula hard to prove by myself. Link to Material
Question is:
" What input image x would cause ai to be maximally activated? "
Screen shot of the Question and Context:
Many thanks to your answers in advance!
While this can be rigorously solved using KLT conditions and Lagrange multipliers, there is a more intuitive way to figure the result out. I assume that f(.) is a monotone increasing, sigmoid type of nonlinearity (ReLU is also valid). So, finding the maximum of w1x1+...+w100x100 + b under the constraint (x1)^2+...+(x100)^2 <= 1 is equivalent to finding the maximum of f(w1x1+...+w100x100 + b) with the same constraint.
Note that g = w1x1+...+w100x100 + b is a linear function of x terms (Name it as g, so later we can refer it by that). So, the direction of largest increase at any point (x1,...,x100) in the domain of that function is the same, which is the gradient. The gradient is simply (w1,w2,...,w100) at any point in the domain, which means if we go in the direction of (w1,w2,...,w100), independent from where we start, we obtain the largest increase in the function. To make things simplier and to allow us to visualize, assume that we are in the R^2 space and the function is w1x1 + w2x2 + b:
The optimum x1 and x2 are constrained to lie in or on the circle C:(x1)^2 + (x2)^2 =1. Assume that we are on the origin (0.0). If we go in the direction of the gradient (blue arrow) (w1,w2), we are going to attain the largest value of the function where the blue arrow intersects with the circle. That intersection has the coordinates c*(w1,w2) and it is c^2(w1^2 + w2^2) = 1, where c is a scalar coefficient. c is easily solved as c= 1 / sqrt(w1^2 + w2^2). Then at the intersection we have x1=w1/sqrt(w1^2 + w2^2) and x2=w2/sqrt(w1^2 + w2^2), which the solution we seek. This can be extended in the same way to 100 dimensional case.
You may ask why we started at the origin and not any other point in the circle. Note that the red line is perpendicular to the gradient vector and the function is constant along that line. Draw that (u1,u2) line, preserving its orientation, arbitrarily with the constraint that it intersects the circle C. Then choose any point on the line, such that it lies within the circle. On the (u1,u2) line, you start at the same value of the function g, wherever you are. Then as you go in the (w1,w2) direction, the longest path taken within the circle always goes through the origin, which means the path you increase the function g the most.

Resources