Code for a multiple quadratic (or polynomial) least squares (surface fit)? - image-processing

for a machine vision project I am trying to search image data for quadratic surfaces (f(x,y) = Ax^2+Bx+Cy^2+Dy+Exy+F). My plan is to iterate through regions of data and perform a surface-fit, look at the error, see if it's a continuous surface (which would probably indicate a feature in the image).
I was previously able to find quadratic curves (f(x) = Ax^2+Bx+C) in the image data by sampling lines, by using the equations on this site
Link
this worked well, was promising, but it would be much more useful for my task to find 2-D regions that form continuous surfaces.
I see lots of articles indicating that least squares regressions scales up to multiple dimensions, but I'm not able to find code for this Hopefully there is a "closed form" (non-iterative, just compute from your data points) solution, like described above for 1D data. Does anybody know of some source or pseudocode that accomplishes this? Thanks.
(Sorry if my terminology is a bit off.)

I'm not sure what your background is, but if you know some linear algebra you will find linear least squares on wikipedia useful.
Lets take the following example. Say we have the following image
and we want to know how well this fits to a 2D quadratic function in a least squares sense.
Probably the most straightforward way to solve the problem is to compute the optimal coefficients in a least squares sense, then check the error.
First we need to describe the matrices.
Let X be a matrix containing every x,y coordinate in the image, taking the form
X = [x1 x1^2 y1 y1^2 x1*y1 1;
x2 x2^2 y2 y2^2 x2*y2 1;
...
xN xN^2 yN yN^2 xN*yN 1];
For the example image above, X would be a 100x6 matrix.
Let y be the image intensity values in a vector of the form
y = [img(x1,y1);
img(x2,y2);
...
img(xN,yN)]
In this case y is a 100 element column vector.
We want to minimize the least squares objective function S with respect to the vector of coefficients b
S(b) = |y - X*b|^2
where |.| is the L2 norm and b is the desired coefficients
b = [A;
B;
C;
D;
E;
F]
Taking the vector derivative of S(b) with respect to b, setting to zero, and solving for b leads to the standard least squares solution.
b = inv(X'X)*X'*y
where inv is the matrix inverse, ' is transpose, and * is matrix multiplication.
MATLAB example.
% Generate an image
% define x,y coordinates for each location in the image
[x,y] = meshgrid(1:10,1:10);
% true coefficients
b_true = [0.1 0.5 0.3 -0.4 0.4 124];
% magnitude of noise
P = 2;
% create image
img = b_true(1).*x + b_true(2).*x.^2 + b_true(3).*y + b_true(4).*y.^2 + b_true(5).*x.*y + b_true(6);
noise = P*randn(10,10);
img = img + noise;
% Begin least squares optimization
% create matrices
X = [x(:) x(:).^2 y(:) y(:).^2 x(:).*y(:) ones(size(x(:)))];
y = img(:);
% estimated coefficients
b = (X.'*X)\(X.')*y
% mean square error (expected to be near P^2)
E = 1/numel(y) * sum((y - X*b).^2)
Output
b =
0.0906
0.5093
0.1245
-0.3733
0.3776
124.5412
E =
3.4699
In your application you would probably want to define some threshold such that when E < threshold you accept the image (or image region) as a quadratic polynomial.

Related

Programmatically performing gradient calculation

Let y = Relu(Wx) where W is a 2d matrix representing a linear transformation on x, a vector. Likewise, let m = Zy, where Z is a 2d matrix representing a linear transformation on y. How do I programmatically calculate the gradient of Loss = sum(m^2) with respect to W, where the power means take the element wise power of the resulting vector, and sum means adding all the elements together?
I can work this out slowly mathematically by taking a hypothetical, multiplying it all out, then element-by-element taking the derivative to construct the gradient, but I can't figure out an efficient approach to write a program once the neural network layer becomes >1.
Say, for just one layer (m = Zy, take gradient wrt Z) I could just say
Loss = sum(m^2)
dLoss/dZ = 2m * y
where * is the outer product of the vectors, and I guess this is kind of like normal calculus and it works. Now for 2 layers + activation (gradient wrt W), if I try to do it like "normal" calculus and apply the chain rule I get:
dLoss/dW = 2m * Z * dRelu * x
where dRelu is the derivative of Relu(Wx) except here I have no idea what * means in this case to make it work.
Is there an easy way to calculate this gradient mathematically without basically multiplying it all out and deriving each separate element in the gradient? I'm really unfamiliar with matrix calculus, so if anyone could also give some mathematical intuition, if my attempt is completely wrong, that would be appreciated.
For the sake of convenience, let's ignore the ReLU for a moment. You have an input space X (of some size [dimX]) mapped to an intermediate space Y (of some size [dimY]) mapped to an output space m (of some size [dimM]) You have, then, W: X → Y a matrix of shape [dimY, dimX] and Z: Y → m a matrix of shape [dimM, dimY]. Finally your loss is simply a function that maps your M space to a scalar value.
Let us walk the way backwards. As you correctly said, you want to compute the derivative of the loss w.r.t W and to do so you need to apply the chain rule all the way back. You then have:
dL/dW = dL/dm * dm/dY * dY/dW
dL/dm is of shape [dimm] (a scalar function with derivatives across dimm dimensions)
dm/dY is of shape [dimm, dimY] (an m-dimensional function with derivatives across dimY dimensions)
dY/dW is of shape [dimY, dimW] = [dimY, dimY, dimX] (a y-dimensional function with derivatives across [dimY, dimX] dimensions)
Edit:
To make the last bit more clear, Y consists of dimY different values, so Y can be treated as dimY constituent functions. We need to apply the gradient operator on each of those mini-functions, all with respect to the basis vectors defined by W. More concretely, if W = [[w11, w12], [w21, w22], [w31, w32]] and x = [x1, x2], then Y = [y1, y2, y3] = [w11x1 + w12x2, w21x1 + w22x2, w31x1 + w32x2]. Then W defines a 6d space (3x2) across which we need to differentiate. We have dY/dW = [dy1/dW, dy2/dW, dy3/dW], and also dy1/dW = [[dy1/dw11, dy1/dw12], [dy1/dw21, dy1/dw22], [dy1/dw31, dy1/dw32]] = [[x1,x2],[0,0],[0,0]], a 3x2 matrix. So dY/dW is a [3,3,2] tensor.
As for the multiplication part; the operation here is tensor contraction (essentially matrix multiplication in high dimension spaces). Practically, if you have a high-order tensor A[[a1, a2, a3... ], β] (i.e. a+1 dimensions, the last of which is of size β) and a tensor B[β, [b1, b2...]] (i.e. b+1 dimensions, the first of which is β), their tensor contraction is a matrix C[[a1,a2...], [b1,b2...]] (i.e. a+b dimensions, the β dimension contracted), where C is obtained by summing over element-wise across the shared dimension β (refer to https://docs.scipy.org/doc/numpy/reference/generated/numpy.tensordot.html#numpy.tensordot).
The resulting tensor contraction then is a matrix of shape [dimY, dimX] which can be used to update your W weights. The ReLU which we ignored earlier can easily be thrown in the mix, since ReLU: 1 → 1 is a scalar function applied element-wise on Y.
To summarize, your code would be:
W_gradient = 2m * np.dot(Z, x) * np.e**x/(1+np.e**x))
I just implemented several multiplier neural networks(MLP) from scratch in C++[1], and I think I know what's your pain. And believe me, you don't even need any third party matrix/tensor/automatic differentiation(AD) libraries to do the matrix multiplication or gradient calculation. There are three things you should pay attention to:
There are two kinds of multiplication in the equations: matrix multiplication, and elementwise multiplication, you'll mess up if you denoted them all as a single *.
Use concrete examples, especially concrete numbers as dimensions of your data/matrix/vector to build intuition.
The most powerful tool for programming correctly is dimension compatibility, always don't forget to check dimensions.
Suppose your want to do binary classification and the neural network is input -> h1 -> sigmoid -> h2 -> sigmoid -> loss in which input layer has 1 sample each has 2 features, h1 has 7 neurons, and h2 has 2 neurons. Then:
forward pass:
Z1(1, 7) = X(1, 2) * W1(2, 7)
A1(1, 7) = sigmoid(Z1(1, 7))
Z2(1, 2) = A1(1, 7) * W2(7, 2)
A2(1, 2) = sigmoid(Z2(1, 2))
Loss = 1/2(A2 - label)^2
backward pass:
dA2(1, 2) = dL/dA2 = A2 - label
dZ2(1, 2) = dL/dZ2 = dA2 * dsigmoid(A2_i) -- element wise
dW2(7, 2) = A1(1, 7).T * dZ2(1, 2) -- matrix multiplication
Notice the last equation, the dimension of the gradient of W2 should match W2, which is (7, 2). And the only way to get a (7, 2) matrix is to transpose input A1 and multiply A1 with dZ2, that's dimension compatibility[2].
backward pass continued:
dA1(1, 7) = dZ2(1, 2) * A1(2, 7) -- matrix multiplication
dZ1(1, 7) = dA1(1, 7) * dsigmoid(A1_i) -- element wise
dW1(2, 7) = X.T(2, 1) * dZ1(1, 7) -- matrix multiplication
[1] The code is here, you can see the hidden layer implementation, naive matrix implementation and the references listed there.
[2] I omit the matrix derivation part, it's simple actually but hard to type the equations out. I strongly suggest you read this paper, every tiny detail you should know on matrix derivation in DL is listed in this paper.
[3] One sample as input is used in the above example(as a vector), you can substitute 1 with any batch numbers(become matrix), and the equations still hold.

Why convolution in spatial domain equal to multiplication in frequency domain?

Why it is said that "convolution of an image in spatial domain is equal to multiplication in frequency domain" ?
Could anyone please explain it briefly?
StackOverflow, unfortunately, doesn't support MathJaX hence it is hard to show the math here.
One way to explain is that Convolution is Linear Invariant Operator.
As you know, Linear Time / Spatially Invariant Systems basically do one thing - Delay and Scaling.
The Eigen Functions of Delay and Scaling are the Harmonic Functions.
Which means that give a signal described by harmonic signals (Practically its Fourier Transform) Linear Time / Spatially Invariant Operator only scales it by complex number (Scaling and shifting by phase) which is what you do in the Fourier Domain.
It is similar to Diagonalization in Linear Algebra.
For instance let's thing of the Filter we apply on the image as an operator - A.
So the output of the system is y = A x.
If A is diagonalizable as A = P^T D P where D is diagonal matrix and P P^T = I, namely Unitary Matrix.
So y = A x = P^T D P x hence by defining z = P x and t = P y we get t = D z namely we only need to multiply each element in t and not the whole matrix multiplication.
If you think about P as the Fourier Transom operator then instead of doing Matrix Multiplication you can have element wise multiplication in other domain - Fourier Domain.

What the relationship of least square and X'Xθ = X'y?

I have three points: (1,1), (2,3), (3, 3.123). I assume the hypothesis is , and I want to do linear regression on the three points. I have two methods to calculate θ:
Method-1: Least Square
import numpy as np
# get an approximate solution using least square
X = np.array([[1,1],[2,1],[3,1]])
y = np.array([1,3,3.123])
theta = np.linalg.lstsq(X,y)[0]
print theta
Method-2: Matrix multiplication
We have the following derivation process:
# rank(X)=2, rank(X|y)=3, so there is no exact solution.
print np.linalg.matrix_rank(X)
print np.linalg.matrix_rank(np.c_[X,y])
theta = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(y))
print theta
Both method-1 and method-2 can get result [ 1.0615 0.25133333], it seems that method-2 is equivalent to least square. But, I don't know why, can anyone reveal the underlying principle of their equivalence?
Both approaches are equivalent, because least squares method is
θ = argmin (Xθ-Y)'(Xθ-Y) = argmin ||(Xθ-Y)||^2 = argmin ||(Xθ-Y)||, that means that you try to minimize length of vector (Xθ-Y), so you try to minimize distance between Xθ and Y. X is a constant matrix, so Xθ is vector from column space of X. That means the shortest distance between these two vectors is when Xθ is equal to projection of vector Y to column space of X (can be easy observed from picture). That results to Y^(hat) = Xθ = X(X'X)^(-1)X'Y, where X(X'X)^(-1)X' is the projection matrix to column space of X. After some changes you can observe that this is equivalent with (X'X)θ = X'Y. You can find exact proof in any linear algebra book.

Geometric representation of Perceptrons (Artificial neural networks)

I am taking this course on Neural networks in Coursera by Geoffrey Hinton (not current).
I have a very basic doubt on weight spaces.
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec2.pdf
Page 18.
If I have a weight vector (bias is 0) as [w1=1,w2=2] and training case as {1,2,-1} and {2,1,1}
where I guess {1,2} and {2,1} are the input vectors. How can it be represented geometrically?
I am unable to visualize it? Why is training case giving a plane which divides the weight space into 2? Could somebody explain this in a coordinate axes of 3 dimensions?
The following is the text from the ppt:
1.Weight-space has one dimension per weight.
2.A point in the space has particular setting for all the weights.
3.Assuming that we have eliminated the threshold each hyperplane could be represented as a hyperplane through the origin.
My doubt is in the third point above. Kindly help me understand.
It's probably easier to explain if you look deeper into the math. Basically what a single layer of a neural net is performing some function on your input vector transforming it into a different vector space.
You don't want to jump right into thinking of this in 3-dimensions. Start smaller, it's easy to make diagrams in 1-2 dimensions, and nearly impossible to draw anything worthwhile in 3 dimensions (unless you're a brilliant artist), and being able to sketch this stuff out is invaluable.
Let's take the simplest case, where you're taking in an input vector of length 2, you have a weight vector of dimension 2x1, which implies an output vector of length one (effectively a scalar)
In this case it's pretty easy to imagine that you've got something of the form:
input = [x, y]
weight = [a, b]
output = ax + by
If we assume that weight = [1, 3], we can see, and hopefully intuit that the response of our perceptron will be something like this:
With the behavior being largely unchanged for different values of the weight vector.
It's easy to imagine then, that if you're constraining your output to a binary space, there is a plane, maybe 0.5 units above the one shown above that constitutes your "decision boundary".
As you move into higher dimensions this becomes harder and harder to visualize, but if you imagine that that plane shown isn't merely a 2-d plane, but an n-d plane or a hyperplane, you can imagine that this same process happens.
Since actually creating the hyperplane requires either the input or output to be fixed, you can think of giving your perceptron a single training value as creating a "fixed" [x,y] value. This can be used to create a hyperplane. Sadly, this cannot be effectively be visualized as 4-d drawings are not really feasible in browser.
Hope that clears things up, let me know if you have more questions.
I have encountered this question on SO while preparing a large article on linear combinations (it's in Russian, https://habrahabr.ru/post/324736/). It has a section on the weight space and I would like to share some thoughts from it.
Let's take a simple case of linearly separable dataset with two classes, red and green:
The illustration above is in the dataspace X, where samples are represented by points and weight coefficients constitutes a line. It could be conveyed by the following formula:
w^T * x + b = 0
But we can rewrite it vice-versa making x component a vector-coefficient and w a vector-variable:
x^T * w + b = 0
because dot product is symmetrical. Now it could be visualized in the weight space the following way:
where red and green lines are the samples and blue point is the weight.
More possible weights are limited to the area below (shown in magenta):
which could be visualized in dataspace X as:
Hope it clarifies dataspace/weightspace correlation a bit. Feel free to ask questions, will be glad to explain in more detail.
The "decision boundary" for a single layer perceptron is a plane (hyper plane)
where n in the image is the weight vector w, in your case w={w1=1,w2=2}=(1,2) and the direction specifies which side is the right side. n is orthogonal (90 degrees) to the plane)
A plane always splits a space into 2 naturally (extend the plane to infinity in each direction)
you can also try to input different value into the perceptron and try to find where the response is zero (only on the decision boundary).
Recommend you read up on linear algebra to understand it better:
https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces
For a perceptron with 1 input & 1 output layer, there can only be 1 LINEAR hyperplane. And since there is no bias, the hyperplane won't be able to shift in an axis and so it will always share the same origin point. However, if there is a bias, they may not share a same point anymore.
I think the reason why a training case can be represented as a hyperplane because...
Let's say
[j,k] is the weight vector and
[m,n] is the training-input
training-output = jm + kn
Given that a training case in this perspective is fixed and the weights varies, the training-input (m, n) becomes the coefficient and the weights (j, k) become the variables.
Just as in any text book where z = ax + by is a plane,
training-output = jm + kn is also a plane defined by training-output, m, and n.
Equation of a plane passing through origin is written in the form:
ax+by+cz=0
If a=1,b=2,c=3;Equation of the plane can be written as:
x+2y+3z=0
So,in the XYZ plane,Equation: x+2y+3z=0
Now,in the weight space;every dimension will represent a weight.So,if the perceptron has 10 weights,Weight space will be 10 dimensional.
Equation of the perceptron: ax+by+cz<=0 ==> Class 0
ax+by+cz>0 ==> Class 1
In this case;a,b & c are the weights.x,y & z are the input features.
In the weight space;a,b & c are the variables(axis).
So,for every training example;for eg: (x,y,z)=(2,3,4);a hyperplane would be formed in the weight space whose equation would be:
2a+3b+4c=0
passing through the origin.
I hope,now,you understand it.
Consider we have 2 weights. So w = [w1, w2]. Suppose we have input x = [x1, x2] = [1, 2]. If you use the weight to do a prediction, you have z = w1*x1 + w2*x2 and prediction y = z > 0 ? 1 : 0.
Suppose the label for the input x is 1. Thus, we hope y = 1, and thus we want z = w1*x1 + w2*x2 > 0. Consider vector multiplication, z = (w ^ T)x. So we want (w ^ T)x > 0. The geometric interpretation of this expression is that the angle between w and x is less than 90 degree. For example, the green vector is a candidate for w that would give the correct prediction of 1 in this case. Actually, any vector that lies on the same side, with respect to the line of w1 + 2 * w2 = 0, as the green vector would give the correct solution. However, if it lies on the other side as the red vector does, then it would give the wrong answer.
However, suppose the label is 0. Then the case would just be the reverse.
The above case gives the intuition understand and just illustrates the 3 points in the lecture slide. The testing case x determines the plane, and depending on the label, the weight vector must lie on one particular side of the plane to give the correct answer.

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Resources