Minimize a equation using opencv - opencv

I need to solve the following equation:
I Know the matrix G, how can I find the the matrix p subject to ||p|| = 1.
Currently I am solving in opencv as follows:
Mat w, u, EigenVectors;
SVD::compute(A, w, u, EigenVectors);
Mat p = EigenVectors.row(EigenVectors.rows-1);
I want to know how can I ensure the condition ||p|| = 1.
Also I want to know the significance and meaning of other rows/cols of the EigenVectors(transposed) ?

I believe you can use cv::SVD::solveZ(). It finds a unit-length solution x of a singular linear system A * x = 0

Looks like you need to use Lagrange multipliers method.
As I know, OpenCV have no ready to use tools for that.
Good example for MATLAB: Lagrange Multipliers

Related

Efficient pseudo-inverse for PyTorch 2D convolution

Background:
Thanks for your attention! I am learning the basic knowledge of 2D convolution, linear algebra and PyTorch. I encounter the implementation problem about the psedo-inverse of the convolution operator. Specifically, I have no idea about how to implement it in an efficient way. Please see the following problem statements for details. Any help/tip/suggestion is welcomed.
(Thanks a lot for your attention!)
The Original Problem:
I have an image feature x with shape [b,c,h,w] and a 3x3 convolutional kernel K with shape [c,c,3,3]. There is y = K * x. How to implement the corresponding pseudo-inverse on y in an efficient way?
There is [y = K * x = Ax], how to implement [x_hat = (A^+)y]?
I guess that there should be some operations using torch.fft. However, I still have no idea about how to implement it. I do not know if there exists an implementation previously.
import torch
import torch.nn.functional as F
c = 32
K = torch.randn(c, c, 3, 3)
x = torch.randn(1, c, 128, 128)
y = F.conv2d(x, K, padding=1)
print(y.shape)
# How to implement pseudo-inverse for y = K * x in an efficient way?
Some of My Efforts:
I may know that the 2D convolution is a linear operator. It is equivalent to a "matrix product" operator. We can actually write out the matrix form of the convolution and calculate its psedo-inverse. However, I think this type of operation will be inefficient. And I have no idea about how to implement it in an efficient way.
According to Wikipedia, the psedo-inverse may satisfy the property of A(A_pinv(x))=x, where A is the convolutional operator, A_pinv is its psedo-inverse, and x may be any image feature.
(Thanks again for reading such a long post!)
This takes the problem to another level.
The convolution itself is a linear operation, you can determine the matrix of the operation and solve a least square problem directly [1], or compute the pseudo-inverse as you mentioned, and then apply to different outputs and predicting a projection of the input.
I am changing your code to using padding=0
import torch
import torch.nn.functional as F
# your code
c = 32
K = torch.randn(c, c, 1, 1)
x = torch.randn(4, c, 128, 128)
y = F.conv2d(x, K, bias=torch.zeros((c,)))
Also, as you probably already suggested the convolution can be computed as ifft(fft(h)*fft(x)). However, the conv2d function is a cross-correlation, so you have to conjugate the filter leading to ifft(fft(h)*fft(x)), also you have to apply this to two axes, and you have to make sure the FFT is calcuated using the same representation (size), since the data is real, we can apply multi-dimensional real FFT. To be complete, conv2d works on multiple channels, so we have to calculate summations of convolutions. Since the FFT is linear, we can simply compute the summations on the frequency domain
using einsum.
s = y.shape[-2:]
K_f = torch.fft.rfftn(K, s)
x_f = torch.fft.rfftn(x, s)
y_f = torch.einsum('jkxy,ikxy->ijxy', K_f.conj(), x_f)
y_hat = torch.fft.irfftn(y_f, s)
Except for the borders it should be accurate (remember FFT computes a cyclic convolution).
torch.max(abs(y_hat[:,:,:-2,:-2] - y[:,:,:,:]))
Now, notice the pattern jk,ik->ij on the einsum, that means y_f[i,j] = sum(K_f[j,k] * x_f[i,k]) = x_f # K_f.T, if # is the matrix product on the first two dimensions. So to invert this operation we have to can interpret the first two dimensions as matrices. The function pinv will compute pseudo-inverses on the last two axes, so in order to use that we have to permute the axes. If we right multiply the output by the pseudo-inverse of transposed K_f we should invert this operation.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))
Nottice that I am using the full convolution, but the conv2d actually cropped the images. Let's apply that
y_hat[:,:,128-(k-1):,:] = 0
y_hat[:,:,:,128-(k-1):] = 0
Repeating the calculation you will see that the input is not accurate anymore, so you have to be careful about what you do with your convolution, but in some situations where you can get this to work it will be in fact efficient.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))

image derivatives using gaussian

this is the function I have for calculating the image derivatives. Please help me understand this code as I am new to this field. If anyone could give me some links to understand this concept, I'll be greatful. some doubts that i have -
Why are we using ndgrid here?
What are the directions 'x', 'y', 'xx', ('xy', 'yx'), 'yy' here?
And how and why does the formula for this gaussian change according to the directions?
Why are we using imfilter at the end?
function D =calc_image_derivatives(I,sigma,direction)
[x,y]=ndgrid(floor(-3*sigma):ceil(3*sigma),floor(-3*sigma):ceil(3*sigma));
switch(direction)
case 'x'
DGauss=-(x./(2*pi*sigma^4)).*exp(-(x.^2+y.^2)/(2*sigma^2));
case 'y'
DGauss=-(y./(2*pi*sigma^4)).*exp(-(x.^2+y.^2)/(2*sigma^2));
case 'xx'
DGauss = 1/(2*pi*sigma^4) * (x.^2/sigma^2 - 1) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
case {'xy','yx'}
DGauss = 1/(2*pi*sigma^6) * (x .* y) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
case 'yy'
DGauss = 1/(2*pi*sigma^4) * (y.^2/sigma^2 - 1) .* exp(-(x.^2 + y.^2)/(2*sigma^2));
end
D = imfilter(I,DGauss,'conv','replicate');
This code calculates various directional derivatives of the image - x/y are first directional derivative, xx/yy/xy are second derivatives. The digital filter used for derivation is a 2D Gaussian of standard variation sigma, derived by the appropriate partial derivative (for example, in the case 'xx', the Gaussian is derived twice by x). From your question, I'm not sure you're familiar with the notion of a partial derivative, you can Google it.
ndgrid is used to create grid matrices - this is a very commonly used approach in Matlab. Perhaps you know the function meshgrid, it is the same, only ndgrid can also create grid matrices of higher dimensions.
imfilter is used to perform a convolution (correlation to be more precise) between the digital filters to the image. The result of the this is an estimation of the required derivative.

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

How can I compute SVD and and verify that the ratio of the first-to-last singular value is sane with OpenCV?

I want to verify that homography matrix will give good results and this this answer
has an answer for it - but, I don't know how to implement the answer. So can anyone recommend how I may use OpenCV to compute SVD and and verify that the ratio of the first-to-last singular value is sane?
There are several ways to compute the SVD in OpenCV:
cv::SVD homographySVD(homography, cv::SVD::FULL_UV); // constructor
// or:
homographySVD(newHomography, cv::SVD::FULL_UV); // operator ()
homographySVD.w.at<double>(0, 0); // access the first singular value
// alternatives:
cv::SVD::compute(homography, w); // compute just the singular values
cv::eigen(homography, w);
Have a look at the documentation for cv::SVD and cv::eigen for more details.
You can compute SVD in python using numpy.
For example:
import numpy as np
U, s, V = np.linalg.svd(a, full_matrices=True)
which factors the matrix a of dimension M x N as u * np.diag(s) * v, where u and v are unitary and s is a 1-d array of a's singular values.
More can be found here.

Cascaded Hough Transform in OpenCV

Is it possible to perform a Cascaded Hough Transform in OpenCV? I understand its just a HT followed by another one. The problem I'm facing is that the values returned are always rho and theta and never in y-intercept form.
Is it possible to convert these values back to y-intercept and split them into sub-spaces so I can detect vanishing points?
Or is it just better to program an implementation of HT myself in, say, Python?
you could try to populate the Hough domain with m and c parameters instead, so that y = mx + c can be re-written as c = y - mx so instead of the usual rho = x cos(theta) + y sin(theta), you have c = y - mx
normally, you'd go through the thetas and calculate the rho, then you increment the accumulator value for that pair of rho and theta. Here, you'd go through the value of m and calculate the values of c, then accumulate that m,c element in the accumulator. The bin with the most votes would be the right m,c
// going through the image looking for edge pixels
for (i = 0;i<numrows;i++)
{
for (j = 0;j<numcols;j++)
{
if (img[i*numcols + j] > 1)
{
for (n = first_m;n<last_m;n++)
{
index = i - n * j;
accum[n][index]++;
}
}
}
}
I guess where this becomes ineffective is that its hard to define the step size for going through m as they should technically go from -infinity to infinity so you'd kind of have trouble. yeah, so much for Hough transform in terms of m,c. Lol
I guess you could go the other way and isolate m so it would be m = (y-c)/x so that now, you cycle through a bunch of y values that make sense and its much more manageable though it's still hard to define your accumulator matrix because m still has no limit. I guess you could limit the values of m that you would be interested in looking for.
Yeah, much more sense to go with rho and theta and convert them into y = mx + c and then even making a brand new image and re-running the hough transform on it.
I don't think OpenCV can perform cascaded hough transforms. You should convert them to xy space yourself. This article might help you:
http://aishack.in/tutorials/converting-lines-from-normal-to-slopeintercept-form/

Resources