Why harris matrix is positive semi-definite - image-processing

I'm learning "Harris Corner Detector" algorithm,
and stuck here that why Harris matrix is positive semi-definite.
Since Harris matrix's trace is positive, so I can tell Harris matrix's two eigenvalues are all positive or one positive and one negative.
So, how to derivate Harris matrix is positive semi-definite?

The matrix generated in the computation of the Harris corner detector is the structure tensor (see here on Wikipedia). The structure tensor M is a matrix created by the outer product of the gradient field g with itself:
g = gradient( image );
M = smooth( g * g' );
(with smooth the local smoothing applied).
Without any smoothing, g * g' would always have one positive eigenvalue and one 0 eigenvalue, by construction. You can see this by writing out the determinant of the resulting matrix, which is always 0, meaning that one of the eigenvalues must be 0 (their product is the determinant). The other one must be positive because the trace is the sum of two squares; since one eigenvalue is 0, the other eigenvalue must be equal to the trace.
The local smoothing adds together several such matrices (weighted addition). Adding together positive semi-definite matrices leads to a positive-semi-definite matrix: if v'*A*v>=0, and v'*B*v>=0, then v'*(A+B)*v>=0.

Related

Why do we normalize homography or fundamental matrix?

I want to know about why do we normalize the homography or fundamental matrix? Here is the code in particular.
H = H * (1.0 / H[2, 2]) # Normalization step. H is [3, 3] matrix.
I can understand that we have to normalize the data before computing SVD because of instability caused by linear least squares but why do we normalize it in end?
A homography in 3D space has 8 degrees of freedom by definition, mapping from one plane to another using perspective. Such a homography can be defined by giving four points, which makes eight coordinates (scalars).
A 3x3 matrix has 9 elements, so it has 9 degrees of freedom. That is one degree more than needed for a homography.
The homography doesn't change when the matrix is scaled (multiplied by a scalar). All the math works the same. You don't need to normalize your homography matrix.
It is a good idea to normalize.
For one, it makes the arithmetic somewhat tamer. Have some wikipedia links to fields of study because weaving all these into a coherent sentence... doesn't add anything:
Numerical analysis, Condition number, Floating-point arithmetic, Numerical error, Numerical stability, ...
Also, normalization makes the matrix easier for humans to interpret. The most common normalization is to scale the matrix such that the last element becomes 1. That is convenient because this whole math happens in a projective space, where the projection causes points to be mapped to the w=1 plane, making vectors have a 1 for the last element.
How is the homography matrix provided to you?
For example, in the scene that some library function calculates and provides the homography matrix to you,
if the function specification doesn't mention about the scale...
In an extreme case, the function can be implemented as:
Matrix3x3 CalculateHomographyMatrix( some arguments )
{
Matrix3x3 H = ...; //Homogoraphy Calculation
return Non_Zero_Random_Value * H; //Wow!
}
Element values may become very large or very small and using such values to your process may cause problems (floating point computation errors).

3D reconstruction from two calibrated cameras - where is the error in this pipeline?

There are many posts about 3D reconstruction from stereo views of known internal calibration, some of which are excellent. I have read a lot of them, and based on what I have read I am trying to compute my own 3D scene reconstruction with the below pipeline / algorithm. I'll set out the method then ask specific questions at the bottom.
0. Calibrate your cameras:
This means retrieve the camera calibration matrices K1 and K2 for Camera 1 and Camera 2. These are 3x3 matrices encapsulating each camera's internal parameters: focal length, principal point offset / image centre. These don't change, you should only need to do this once, well, for each camera as long as you don't zoom or change the resolution you record in.
Do this offline. Do not argue.
I'm using OpenCV's CalibrateCamera() and checkerboard routines, but this functionality is also included in the Matlab Camera Calibration toolbox. The OpenCV routines seem to work nicely.
1. Fundamental Matrix F:
With your cameras now set up as a stereo rig. Determine the fundamental matrix (3x3) of that configuration using point correspondences between the two images/views.
How you obtain the correspondences is up to you and will depend a lot on the scene itself.
I am using OpenCV's findFundamentalMat() to get F, which provides a number of options method wise (8-point algorithm, RANSAC, LMEDS).
You can test the resulting matrix by plugging it into the defining equation of the Fundamental matrix: x'Fx = 0 where x' and x are the raw image point correspondences (x, y) in homogeneous coordinates (x, y, 1) and one of the three-vectors is transposed so that the multiplication makes sense. The nearer to zero for each correspondence, the better F is obeying it's relation. This is equivalent to checking how well the derived F actually maps from one image plane to another. I get an average deflection of ~2px using the 8-point algorithm.
2. Essential Matrix E:
Compute the Essential matrix directly from F and the calibration matrices.
E = K2TFK1
3. Internal Constraint upon E:
E should obey certain constraints. In particular, if decomposed by SVD into USV.t then it's singular values should be = a, a, 0. The first two diagonal elements of S should be equal, and the third zero.
I was surprised to read here that if this is not true when you test for it, you might choose to fabricate a new Essential matrix from the prior decomposition like so: E_new = U * diag(1,1,0) * V.t which is of course guaranteed to obey the constraint. You have essentially set S = (100,010,000) artificially.
4. Full Camera Projection Matrices:
There are two camera projection matrices P1 and P2. These are 3x4 and obey the x = PX relation. Also, P = K[R|t] and therefore K_inv.P = [R|t] (where the camera calibration has been removed).
The first matrix P1 (excluding the calibration matrix K) can be set to [I|0] then P2 (excluding K) is R|t
Compute the Rotation and translation between the two cameras R, t from the decomposition of E. There are two possible ways to calculate R (U*W*V.t and U*W.t*V.t) and two ways to calculate t (±third column of U), which means that there are four combinations of Rt, only one of which is valid.
Compute all four combinations, and choose the one that geometrically corresponds to the situation where a reconstructed point is in front of both cameras. I actually do this by carrying through and calculating the resulting P2 = [R|t] and triangulating the 3d position of a few correspondences in normalised coordinates to ensure that they have a positive depth (z-coord)
5. Triangulate in 3D
Finally, combine the recovered 3x4 projection matrices with their respective calibration matrices: P'1 = K1P1 and P'2 = K2P2
And triangulate the 3-space coordinates of each 2d point correspondence accordingly, for which I am using the LinearLS method from here.
QUESTIONS:
Are there any howling omissions and/or errors in this method?
My F matrix is apparently accurate (0.22% deflection in the mapping compared to typical coordinate values), but when testing E against x'Ex = 0 using normalised image correspondences the typical error in that mapping is >100% of the normalised coordinates themselves. Is testing E against xEx = 0 valid, and if so where is that jump in error coming from?
The error in my fundamental matrix estimation is significantly worse when using RANSAC than the 8pt algorithm, ±50px in the mapping between x and x'. This deeply concerns me.
'Enforcing the internal constraint' still sits very weirdly with me - how can it be valid to just manufacture a new Essential matrix from part of the decomposition of the original?
Is there a more efficient way of determining which combo of R and t to use than calculating P and triangulating some of the normalised coordinates?
My final re-projection error is hundreds of pixels in 720p images. Am I likely looking at problems in the calibration, determination of P-matrices or the triangulation?
The error in my fundamental matr1ix estimation is significantly worse
when using RANSAC than the 8pt algorithm, ±50px in the mapping between
x and x'. This deeply concerns me.
Using the 8pt algorithm does not exclude using the RANSAC principle.
When using the 8pt algorithm directly which points do you use? You have to choose 8 (good) points by yourself.
In theory you can compute a fundamental matrix from any point correspondences and you often get a degenerated fundamental matrix because the linear equations are not independend. Another point is that the 8pt algorithm uses a overdetermined system of linear equations so that one single outlier will destroy the fundamental matrix.
Have you tried to use the RANSAC result? I bet it represents one of the correct solutions for F.
My F matrix is apparently accurate (0.22% deflection in the mapping
compared to typical coordinate values), but when testing E against
x'Ex = 0 using normalised image correspondences the typical error in
that mapping is >100% of the normalised coordinates themselves. Is
testing E against xEx = 0 valid, and if so where is that jump in error
coming from?
Again, if F is degenerated, x'Fx = 0 can be for every point correspondence.
Another reason for you incorrect E may be the switch of the cameras (K1T * E * K2 instead of K2T * E * K1). Remember to check: x'Ex = 0
'Enforcing the internal constraint' still sits very weirdly with me -
how can it be valid to just manufacture a new Essential matrix from
part of the decomposition of the original?
It is explained in 'Multiple View Geometry in Computer Vision' from Hartley and Zisserman. As far as I know it has to do with the minimization of the Frobenius norm of F.
You can Google it and there are pdf resources.
Is there a more efficient way of determining which combo of R and t to
use than calculating P and triangulating some of the normalised
coordinates?
No as far as I know.
My final re-projection error is hundreds of pixels in 720p images. Am
I likely looking at problems in the calibration, determination of
P-matrices or the triangulation?
Your rigid body transformation P2 is incorrect because E is incorrect.

Geometric representation of Perceptrons (Artificial neural networks)

I am taking this course on Neural networks in Coursera by Geoffrey Hinton (not current).
I have a very basic doubt on weight spaces.
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec2.pdf
Page 18.
If I have a weight vector (bias is 0) as [w1=1,w2=2] and training case as {1,2,-1} and {2,1,1}
where I guess {1,2} and {2,1} are the input vectors. How can it be represented geometrically?
I am unable to visualize it? Why is training case giving a plane which divides the weight space into 2? Could somebody explain this in a coordinate axes of 3 dimensions?
The following is the text from the ppt:
1.Weight-space has one dimension per weight.
2.A point in the space has particular setting for all the weights.
3.Assuming that we have eliminated the threshold each hyperplane could be represented as a hyperplane through the origin.
My doubt is in the third point above. Kindly help me understand.
It's probably easier to explain if you look deeper into the math. Basically what a single layer of a neural net is performing some function on your input vector transforming it into a different vector space.
You don't want to jump right into thinking of this in 3-dimensions. Start smaller, it's easy to make diagrams in 1-2 dimensions, and nearly impossible to draw anything worthwhile in 3 dimensions (unless you're a brilliant artist), and being able to sketch this stuff out is invaluable.
Let's take the simplest case, where you're taking in an input vector of length 2, you have a weight vector of dimension 2x1, which implies an output vector of length one (effectively a scalar)
In this case it's pretty easy to imagine that you've got something of the form:
input = [x, y]
weight = [a, b]
output = ax + by
If we assume that weight = [1, 3], we can see, and hopefully intuit that the response of our perceptron will be something like this:
With the behavior being largely unchanged for different values of the weight vector.
It's easy to imagine then, that if you're constraining your output to a binary space, there is a plane, maybe 0.5 units above the one shown above that constitutes your "decision boundary".
As you move into higher dimensions this becomes harder and harder to visualize, but if you imagine that that plane shown isn't merely a 2-d plane, but an n-d plane or a hyperplane, you can imagine that this same process happens.
Since actually creating the hyperplane requires either the input or output to be fixed, you can think of giving your perceptron a single training value as creating a "fixed" [x,y] value. This can be used to create a hyperplane. Sadly, this cannot be effectively be visualized as 4-d drawings are not really feasible in browser.
Hope that clears things up, let me know if you have more questions.
I have encountered this question on SO while preparing a large article on linear combinations (it's in Russian, https://habrahabr.ru/post/324736/). It has a section on the weight space and I would like to share some thoughts from it.
Let's take a simple case of linearly separable dataset with two classes, red and green:
The illustration above is in the dataspace X, where samples are represented by points and weight coefficients constitutes a line. It could be conveyed by the following formula:
w^T * x + b = 0
But we can rewrite it vice-versa making x component a vector-coefficient and w a vector-variable:
x^T * w + b = 0
because dot product is symmetrical. Now it could be visualized in the weight space the following way:
where red and green lines are the samples and blue point is the weight.
More possible weights are limited to the area below (shown in magenta):
which could be visualized in dataspace X as:
Hope it clarifies dataspace/weightspace correlation a bit. Feel free to ask questions, will be glad to explain in more detail.
The "decision boundary" for a single layer perceptron is a plane (hyper plane)
where n in the image is the weight vector w, in your case w={w1=1,w2=2}=(1,2) and the direction specifies which side is the right side. n is orthogonal (90 degrees) to the plane)
A plane always splits a space into 2 naturally (extend the plane to infinity in each direction)
you can also try to input different value into the perceptron and try to find where the response is zero (only on the decision boundary).
Recommend you read up on linear algebra to understand it better:
https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces
For a perceptron with 1 input & 1 output layer, there can only be 1 LINEAR hyperplane. And since there is no bias, the hyperplane won't be able to shift in an axis and so it will always share the same origin point. However, if there is a bias, they may not share a same point anymore.
I think the reason why a training case can be represented as a hyperplane because...
Let's say
[j,k] is the weight vector and
[m,n] is the training-input
training-output = jm + kn
Given that a training case in this perspective is fixed and the weights varies, the training-input (m, n) becomes the coefficient and the weights (j, k) become the variables.
Just as in any text book where z = ax + by is a plane,
training-output = jm + kn is also a plane defined by training-output, m, and n.
Equation of a plane passing through origin is written in the form:
ax+by+cz=0
If a=1,b=2,c=3;Equation of the plane can be written as:
x+2y+3z=0
So,in the XYZ plane,Equation: x+2y+3z=0
Now,in the weight space;every dimension will represent a weight.So,if the perceptron has 10 weights,Weight space will be 10 dimensional.
Equation of the perceptron: ax+by+cz<=0 ==> Class 0
ax+by+cz>0 ==> Class 1
In this case;a,b & c are the weights.x,y & z are the input features.
In the weight space;a,b & c are the variables(axis).
So,for every training example;for eg: (x,y,z)=(2,3,4);a hyperplane would be formed in the weight space whose equation would be:
2a+3b+4c=0
passing through the origin.
I hope,now,you understand it.
Consider we have 2 weights. So w = [w1, w2]. Suppose we have input x = [x1, x2] = [1, 2]. If you use the weight to do a prediction, you have z = w1*x1 + w2*x2 and prediction y = z > 0 ? 1 : 0.
Suppose the label for the input x is 1. Thus, we hope y = 1, and thus we want z = w1*x1 + w2*x2 > 0. Consider vector multiplication, z = (w ^ T)x. So we want (w ^ T)x > 0. The geometric interpretation of this expression is that the angle between w and x is less than 90 degree. For example, the green vector is a candidate for w that would give the correct prediction of 1 in this case. Actually, any vector that lies on the same side, with respect to the line of w1 + 2 * w2 = 0, as the green vector would give the correct solution. However, if it lies on the other side as the red vector does, then it would give the wrong answer.
However, suppose the label is 0. Then the case would just be the reverse.
The above case gives the intuition understand and just illustrates the 3 points in the lecture slide. The testing case x determines the plane, and depending on the label, the weight vector must lie on one particular side of the plane to give the correct answer.

multi layer perceptron - finding the "separating" curve

with single-layer perceptron it's easy to find the equation of the "separating line" (I don't know the professional term), the line that separate between 2 types of points, based on the perceptron's weights, after it was trained. How can I find in a similar way the equation of the curve (not straight line) that separate between 2 types of points, in a multi-layer perceptron?
thanks.
This is only an attempt to get an approximation to the separating boundary or curve.
Dataset
Below I plotted the separating curve between the two types of the example dataset. The dataset is borrowed from coursera - Andrew Ng's machine learning course. Also the code snippet below borrows the ideas from Ex6 of Andrew's ML course.
Boundary Plot
To plot the separating curve,
You first train your neural network against your training data;
Generate a 2d grid of data using the granularity you want, in Matlab, this is something like:
x1plot = linspace(min(X(:,1)), max(X(:,1)), 100)';
x2plot = linspace(min(X(:,2)), max(X(:,2)), 100)';
[X1, X2] = meshgrid(x1plot, x2plot);
For each data point in the grid, calculate the predicted label using your neural network;
Drawing the coutour graph of the resulting labels of grid
vals = zeros(size(X1));
for i = 1:size(X1, 2)
this_X = [X1(:, i), X2(:, i)];
% mlpPredict() is the function to use your trained neural network model
% to get a predicted label.
vals(:, i) = mlpPredict(model, this_X);
end
% Plot the boundary
hold on
[C, Lev] = contour(X1, X2, vals, [0 0], 'Color', 'b');
hold off;
If your goal is only to get the exact mathematical representation of the boundary curve, this method won't work. This method can only give you an approximation of the curve up to the granularity you set up in your grid.
If you do want a precise description of the boundary, SVM might be a good alternative since the whole set of support vectors could serve as the boundary descriptive.
Approximate boundary using contour points
I took a look at octave's documentation about contour. Basically, contour uses the contour matrix C computed by contourc from the same arguments. Here is the signature of contourc:
[C, LEV] = contourc (X, Y, Z, VN)
This function computes contour lines of the matrix Z. Parameters X, Y and VN are optional.
The return value LEV is a vector of the contour levels. The
return value C is a 2 by N matrix containing the contour lines in
the following format
C = [lev1, x1, x2, ..., levn, x1, x2, ...
len1, y1, y2, ..., lenn, y1, y2, ...]
in which contour line N has a level (height) of LEVN and length of
LENN.
So if you do want to get an analytical description of the curve, matrix C should contain enough information about it. In my sample plot, after parsing of C, I get 30 levels. The coordinates of the first 6 points in the first level are listed below:
x: 2.3677e-01 2.3764e-01 2.4640e-01 2.4640e-01 2.4640e-01 2.4640e-01 ...
y: 4.0263e-01 4.0855e-01 4.0909e-01 4.1447e-01 4.2039e-01 4.2631e-01 ...
Please notice that they are exactly the points on the contour starting from (0.23677, 0.40263). Using these contour points, it's straightforward to approximate the curve using multiple line segments (because each line segment can be determined by two end points).
Hope it helps.

Why is weight vector orthogonal to decision plane in neural networks

I am beginner in neural networks. I am learning about perceptrons.
My question is Why is weight vector perpendicular to decision boundary(Hyperplane)?
I referred many books but all are mentioning that weight vector is perpendicular to decision boundary but none are saying why?
Can anyone give me an explanation or reference to a book?
The weights are simply the coefficients that define a separating plane. For the moment, forget about neurons and just consider the geometric definition of a plane in N dimensions:
w1*x1 + w2*x2 + ... + wN*xN - w0 = 0
You can also think of this as being a dot product:
w*x - w0 = 0
where w and x are both length-N vectors. This equation holds for all points on the plane. Recall that we can multiply the above equation by a constant and it still holds so we can define the constants such that the vector w has unit length. Now, take out a piece of paper and draw your x-y axes (x1 and x2 in the above equations). Next, draw a line (a plane in 2D) somewhere near the origin. w0 is simply the perpendicular distance from the origin to the plane and w is the unit vector that points from the origin along that perpendicular. If you now draw a vector from the origin to any point on the plane, the dot product of that vector with the unit vector w will always be equal to w0 so the equation above holds, right? This is simply the geometric definition of a plane: a unit vector defining the perpendicular to the plane (w) and the distance (w0) from the origin to the plane.
Now our neuron is simply representing the same plane as described above but we just describe the variables a little differently. We'll call the components of x our "inputs", the components of w our "weights", and we'll call the distance w0 a bias. That's all there is to it.
Getting a little beyond your actual question, we don't really care about points on the plane. We really want to know which side of the plane a point falls on. While w*x - w0 is exactly zero on the plane, it will have positive values for points on one side of the plane and negative values for points on the other side. That's where the neuron's activation function comes in but that's beyond your actual question.
Intuitively, in a binary problem the weight vector points in the direction of the '1'-class, while the '0'-class is found when pointing away from the weight vector. The decision boundary should thus be drawn perpendicular to the weight vector.
See the image for a simplified example: You have a neural network with only 1 input which thus has 1 weight. If the weight is -1 (the blue vector), then all negative inputs will become positive, so the whole negative spectrum will be assigned to the '1'-class, while the positive spectrum will be the '0'-class. The decision boundary in a 2-axis plane is thus a vertical line through the origin (the red line). Simply said it is the line perpendicular to the weight vector.
Lets go through this example with a few values. The output of the perceptron is class 1 if the sum of all inputs * weights is larger than 0 (the default threshold), otherwise if the output is smaller than the threshold of 0 then the class is 0. Your input has value 1. The weight applied to this single input is -1, so 1 * -1 = -1 which is less than 0. The input is thus assigned class 0 (NOTE: class 0 and class 1 could have just been called class A or class B, don't confuse them with the input and weight values). Conversely, if the input is -1, then input * weight is -1 * -1 = 1, which is larger than 0, so the input is assigned to class 1. If you try every input value then you will see that all the negative values in this example have an output larger than 0, so all of them belong to class 1. All positive values will have an output of smaller than 0 and therefore will be classified as class 0. Draw the line which separates all positive and negative input values (the red line) and you will see that this line is perpendicular to the weight vector.
Also note that the weight vector is only used to modify the inputs to fit the wanted output. What would happen without a weight vector? An input of 1, would result in an output of 1, which is larger than the threshold of 0, so the class is '1'.
The second image on this page shows a perceptron with 2 inputs and a bias. The first input has the same weight as my example, while the second input has a weight of 1. The corresponding weight vector together with the decision boundary are thus changed as seen in the image. Also the decision boundary has been translated to the right due to an added bias of 1.
Here is a viewpoint from a more fundamental linear algebra/calculus standpoint:
The general equation of a plane is Ax + By + Cz = D (can be extended for higher dimensions). The normal vector can be extracted from this equation: [A B C]; it is the vector orthogonal to every other vector that lies on the plane.
Now if we have a weight vector [w1 w2 w3], then when do w^T * x >= 0 (to get positive classification) and w^T * x < 0 (to get negative classification). WLOG, we can also do w^T * x >= d. Now, do you see where I am going with this?
The weight vector is the same as the normal vector from the first section. And as we know, this normal vector (and a point) define a plane: which is exactly the decision boundary. Hence, because the normal vector is orthogonal to the plane, then so too is the weight vector orthogonal to the decision boundary.
Start with the simplest form, ax + by = 0, weight vector is [a, b], feature vector is [x, y]
Then y = (-a/b)x is the decision boundary with slope -a/b
The weight vector has slope b/a
If you multiply those two slopes together, result is -1
This proves decision boundary is perpendicular to weight vector
Although the question was asked 2 years ago, I think many students will have the same doubts. I reached this answer because I asked the same question.
Now, just think of X, Y (a Cartesian coordinate system is a coordinate system that specifies each point uniquely in a plane by a pair of numerical coordinates, which are the signed distances from the point to two fixed perpendicular directed lines [from Wikipedia]).
If Y = 3X, in geometry Y is perpendicular to X, then let w = 3, then Y = wX, w = Y/X and if we want to draw the relation between X, w we will have two perpendicular lines just like when we draw the relation between X, Y. So always think of the w-coefficient as perpendicular to X and Y.

Resources