Matrix of several perspective transformations - opencv

I am writing some image processing program using OpenCV.
I need to transform the image using several perspective transformations.
Perspective transformation is defined by the matrix. I know, that we can get complex affine transform by multiplication of the simple transform matriсes (rotation, translation, etc.).
But when I tried to multiply two perspective transformation matrices, I didn't get the transformation matrix, that corresponds to the consequently used first and second matrix.
So, how can I get the matrix of several consequent perspective transformations?

Let you have two perspective matrices C:(x,y)->(u,v) and D:(u,v)->(r,g):
And you try to get M:(x,y)->(r,g)
You should substitute ui and vi from (1),(2) to the equations (3),(4).
 ui = (c00*xi + c01*yi + c02) / (c20*xi + c21*yi + c22) (1)
 vi = (c10*xi + c11*yi + c12) / (c20*xi + c21*yi + c22) (2)
 ri = (d00*ui + d01*vi + d02) / (d20*ui + d21*vi + d22) (3)
 gi = (d10*ui + d11*vi + d12) / (d20*ui + d21*vi + d22) (4)
After that you can see that M = D*C

Related

When comparing pictures, SIFT and ORB methods in OPenCV don't work when one picture has been mirrored

I'm trying to find all pictures that are "almost" identical in a folder.
By "almost identical", I mean for example you have an original picture, and you have modifications of this picture.
The modifications can be change of resolution, picture changed to greyscale, picture was cropped, picture rotated, a frame or some text was added, picture was mirrored...
I'm using OpenCV with SIFT and ORB (I choose each what method I want to use, I don't use them at the same time).
For all the picture variations, both SIFT and ORB work quite well. But not for the mirror picture.
Even if I only make a mirror image of the first picture (meaning I don't change anything else), the score is about 10%.
I don't understand, as I thought that SIFT and ORB were calculating distances of keypoints. But when taking a mirror image, the distances don't change. Only the direction.
What am I missing ?
Here is the extract from my code :
if method == 'ORB':
finder = cv2.ORB_create()
elif method == 'SIFT':
finder = cv2.xfeatures2d.SIFT_create()
lowe_ration = 0.86
# find the keypoints and descriptors with SIFT or ORB
for i in range(0,count-1):
for j in range(i+1,count):
kp1, des1 = finder.detectAndCompute(all_new_images_to_compare[i],None)
kp2, des2 = finder.detectAndCompute(all_new_images_to_compare[j],None)
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
good_points = []
for m,n in matches:
if m.distance < lowe_ratio * n.distance:
good_points.append(m)
number_keypoints = 0
if len(kp1) >= len(kp2):
number_keypoints = len(kp1)
else:
number_keypoints = len(kp2)
percentage_similarity = len(good_points) / (number_keypoints) * 100
if (percentage_similarity)>=10:
myfile1=open("C:/Users/ABC/Documents/Find-Similar-Pictures/results_" + method + ".txt","a")
myfile1.write(str(titles[i]) + "\t" + str(titles[j]) + "\t" + method + " (" + str(lowe_ratio) + ") \t" + str(int(percentage_similarity) ) + "\t\n")
myfile1.close()
print(titles[i], titles[j],"== Similarity: " + str(int(percentage_similarity)), method + " (" + str(lowe_ratio) + ")")
print("___done with file", titles[i])
print("=====done=====")
Thanks a lot for your help
You've discovered a property of ORB descriptor and SIFT's detector - neither is invariant to reflection.
If you're interested in matching reflected images, you'll need to do one of the following :
Use a symmetric keypoint detector, e.g. FAST, Harris or CeNSURE and then use the SIFT descriptor on the reflected keypoint.
implement a reflection invariant descriptor like MBR-SIFT.
There's a good analysis of the problem given in this paper: Symmetric Stability of Low Level Feature Detectors
Best of luck!

Lens distortion model vs correction model

The lens model in OpenCV is a sort of distortion model which distorts an ideal position to the corresponding real (distorted) position:
x_corrected = x_distorted ( 1 + k_1 * r^2 + k_2 * r^4 + ...),
y_corrected = y_distorted ( 1 + k_1 * r^2 + k_2 * r^4 + ...),
where r^2 = x_distorted^2 + y_distorted^2 in the normalized image coordinate (the tangential distortion is omitted for simplicity). This is also found in Z. Zhang: "A Flexible New Technique for Camera Calibration," TPAMI 2000, and also in "Camera Calibration Toolbox for Matlab" by Bouguet.
On the other hand, Bradski and Kaehler: "Learning OpenCV" introduces in p.376 the lens model as a correction model which corrects a distorted position to the ideal position:
x_distorted = x_corrected ( 1 + k'_1 * r'^2 + k'_2 * r'^4 + ...),
y_distorted = y_corrected ( 1 + k'_1 * r'^2 + k'_2 * r'^4 + ...),
where r'^2 = x_corrected^2 + y_corrected^2 in the normalized image coordinate.
Hartley and Zisserman: "Multiple View Geometry in Computer Vision" also describes this model.
I understand the both correction and distortion models have advantages and disadvantages in practice. For example, the former makes correction of detected feature point locations easy, while the latter makes the undistortion of the entire image straightforward.
My question is, why they share the same polynomial expression, while they are supposed to be the inverse of each other? I could find this document evaluating the inversibility, but its theoretical background is not clear to me.
Thank you for your help.
I think the short answer is: they are just different models, so they're not supposed to be each other's inverse. Like you already wrote, each has its own advantages and disadvantages.
As to inversibility, this depends on the order of the polynomial. A 2nd-order (quadratic) polynomial is easily inverted. A 4th-order requires some more work, but can still be analytically inverted. But as soon as you add a 6th-order term, you'll probably have to resort to numeric methods to find the inverse, because a 5th-order or higher polynomial is not analytically invertible in the general case.
According to taylor expansion every formula in world can be written as c0 + c1*x + c2*x^2 + c3*x^3 + c4*x^4...
The goal is just discover the constants.
In our particular case the expression must be symmetric in x and -x (even function) so the constants in x, x^3, x^5, x^7 are equal to zero.

Can you give me a short step by step numerical example of radial basis function kernel trick? I would like to understand how to apply on perceptron

I understand well perceptron so put accent only on kernel but I am not familiar with matemathic expressions so please give me an numerical example and a guide on kernel.
For example:
My hyperplane of perceptron is x1*w1+x2*w2+x3*w3+b=0; The RBF kernel formula: k(x,z) = exp((-|x-z|^2)/2*variance^2) where takes action the radial basis function kernel here. Is x an input and what is z variable here?
Or of what I have to calculate variance if it is variance in the formula?
Somewhere I have understood so that I have to plug this formula in perceptron decision function x1*w1+x2*w2+x3*w3+b=0; but how does it look look like If I plug in?
I would like to ask a numerical example to avoid confusion.
Linear Perceptron
As you know linear perceptrons can be trained for binary classification. More precisely, if there is n features, x1, x2, ..., xn in n-dimensional space, Rn, and you want to label them in 2 categories, y1 & y2 (usually -1 and +1), you can use linear perceptron which defines a hyperplane w1*x1 + ... + wn*xn + b = 0 to do so.
w1*x1 + ... + wn*xn + b > 0 or W.X + b > 0 ==> class = y1
w1*x1 + ... + wn*xn + b < 0 or W.X + b < 0 ==> class = y2
Linear perceptron will work well, only if the problem is linearly separable in Rn. For example, in 2D space, this means that one line can separate the 2 sets of points.
Algorithm
One common algorithm to train the perceptron, i.e., find weights and bias, w's & b, based on N data points, X1, ..., XN, and their labels, Y1, ..., YN is the following:
Initialize: W = zeros(n,1); b = 0
For i=1 to N:
Calculate F(Xi) = W.Xi + b
If F(Xi)*Yi <= 0:
W <--- W + Xi*Yi
b <--- b + Yi
This will give the final value for W & b. Besides, based on the training, W will be a linear combination of training points, Xi's, more precisely, the ones that were misclassified. So W = a1*X1 + ... + ...aN*XN where a's are in {0,y1,y2}.
Now, if there is a new point, let's say Z, to label, we check the sign of F(Z) = W.Z + b = a1*(X1.Z) + ... + aN*(XN.Z) + b. It is interesting that only the inner product of new point and training points take part in it.
Kernel Perceptron
Now, if the problem is not linearly separable, one may try to go to a higher dimensional space in which a hyperplane can do the classification. As an example, consider a circle in 2D space. The points inside and outside of the circle can't be separated by a line. However, if you find a transformation that can take the points to 3D space such that the first 2 coordinates remain the same for all points, and the 3rd coordinate become +1 and -1 for the points inside and outside of the circle respectively, then a plane defined as 3rd coordinate = 0 can separate the points.
Finding such transformations can be difficult and computationally heavy, so the kernel trick is introduced. Notice that we only used the inner product of new points with the training points. Kernel trick employs this fact and defines the inner product of the transformed points without actually finding the transformation.
If the unknown transformation is P(X) then Kernel function will be:
K(Xi,Xj) = <P(Xi),P(Xj)>. So instead of finding P, kernel functions are defined which represent the scalar result of the inner product in high-dimensional space. There are also theorems about what functions can be kernel functions, i.e., correspond to inner product in another space.
After choosing a kernel function, the algorithm will be modified as follows:
Initialize: F(X) = 0
For i=1 to N:
Calculate F(Xi)
If F(Xi)*Yi <= 0:
F(.) <--- F(.) + K(.,Xi)*Yi + Yi
At the end, F(.) = a1*K(.,X1) + ... + ...aN*K(.,XN) + b where a's are in {0,y1,y2}.
RBF Kernel
Radial basis function is one type of kernel function that is actually computing the inner product in an infinite-dimensional space. It can be written as
K(Xi,Xj) = exp(- norm2(Xi-Xj)^2 / (2*sigma^2))
Sigma is some parameter that you can work with to find an optimum value for. For example, you can train the model with different values of sigma and then find the best value based on the performance. You can start with sigma = 1
After training the model to find F(.), for a new data Z, the sign of F(Z) = a1*K(Z,X1) + ... + ...aN*K(Z,XN) + b will determine the class.
Remarks:
Regarding to your question about variance, you don't need to find any variance.
About x and z in your question, in each iteration, you should find the kernel output for the current data point and all the previously added points (the points that were misclassified and hence were added to F).
I couldn't come up with a simple instructive numerical example.
References:
I borrowed some notation from
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjVu-fXo8DOAhVDxCYKHQkcDDAQFggoMAE&url=http%3A%2F%2Falex.smola.org%2Fteaching%2Fpune2007%2Fpune_3.pdf&usg=AFQjCNHlxy9TnY8xNe2-QDERipN_GycSqQ&bvm=bv.129422649,d.eWE

Uniform discretization of Bezier curve

I need to discretise a 3rd order Bezier curve with points equally distributed along the curve. The curve is defined by four points p0, p1, p2, p3 and a generic point p(t) with 0 < t < 1 is given by:
point_t = (1 - t) * (1 - t) * (1 - t) * p0 + 3 * (1 - t) * (1 - t) * t * p1 + 3 * (1 - t) * t * t * p2 + t * t * t * p3;
My first idea was to discretise t = 0, t_1, ... t_n, ..., 1
This doesn't work as, in general, we don't end up with a uniform distance between the discretised points.
To sum up, what I need is an algorithm to discretise the parametric curve so that:
|| p(t_n) - p(t_n_+_1) || = d
I thought about recursively halving the Bezier curve with the Casteljau algorithm up to required resolution, but this would require a lot of distance calculations.
Any idea on how to solve this problem analytically?
What you are looking for is also called "arc-length parametrisation".
In general, if you subdivide a bezier curve at fixed interval of the default parametrisation, the resulting curve segments will not have the same arc-length. Here is one way to do it http://pomax.github.io/bezierinfo/#tracing.
A while ago, I was playing around with a bit of code (curvature flow) that needed the points to be as uniformly separated as possible. Here is a comparison (without proper labeling on axes! ;)) using linear interpolation and monotone cubic interpolation from the same set of quadrature samples (I used 20 samples per curve, each evaluated using a 24 point gauss-legendre Quadrature) to reparametrise a cubic curve.
[Please note that, this is compared with another run of the algorithm using a lot more nodes and samples taken as ground truth.]
Here is a demo using monotone cubic interpolation to reparametrise a curve. The function Curve.getLength is the quadrature function.

OpenCV DFT_INVERSE different from Matlab's ifft

I try to filter a signal using opencv's dft function. The way I try to this is taking the signal in time domain:
x = [0.0201920000000000 -0.0514940000000000 0.0222140000000000 0.0142460000000000 -0.00313500000000000 0.00270600000000000 0.0111770000000000 0.0233470000000000 -0.00162700000000000 -0.0306280000000000 0.0239410000000000 -0.0225840000000000 0.0281410000000000 0.0265510000000000 -0.0272180000000000 0.0223850000000000 -0.0366850000000000 0.000515000000000000 0.0213440000000000 -0.0107180000000000 -0.0222150000000000 -0.0888300000000000 -0.178814000000000 -0.0279280000000000 -0.144982000000000 -0.199606000000000 -0.225617000000000 -0.188347000000000 0.00196200000000000 0.0830530000000000 0.0716730000000000 0.0723950000000000]
Convert it to FOURIER domain using :
cv::dft(x, x_fft, cv::DFT_COMPLEX_OUTPUT, 0);
Eliminate the unwanted frequencies:
for(int k=0; k<32;k++){
if(k==0 || k>6 )
{
x_fft.ptr<float>(0)[2*k+0]=0;
x_fft.ptr<float>(0)[2*k+1]=0;
}
}
Convert it back to time domain:
cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
In order to check my results I've compared them to Matlab. I took the same signal x, convert it to FOURIER using x_mfft = fft(x); The results are similar to the ones I get from opencv, excepting the fact that in opencv I only get the left side, while in matlab I get the symmetric values too.
After this I set to 0 in Matlab the values of x_mfft(0) and x_mfft(8:32) and now the signal look exactly the same except the fact that in Matlab they are in complex form, while in opencv they are separated, real part in one channel, imaginary part in the other.
The problem is that when I perform the inverse transform in matlab using x_mfilt = ifft(x_mfft) the results are completely different from what I get using opencv.
Matlab:
0.0126024108604191 + 0.0100628178150509i 0.00278762121814893 - 0.00615997579216921i 0.0116716145588075 - 0.0150834711251450i 0.0204808089882897 - 0.00937680194210788i 0.0187164132302469 - 0.000843687942567208i 0.0132322795522116 - 0.000108642129381095i 0.0140282455278201 - 0.00325620843335947i 0.0190436542174946 - 0.000556561558544529i 0.0182379867325824 + 0.00764390022568001i 0.00964801276734883 + 0.0107158342431018i 0.00405220362962359 + 0.00339496875258604i 0.0108096973356501 - 0.00476499376334313i 0.0236507440224628 - 0.000415067678294738i 0.0266197220512826 + 0.0154626911663024i 0.0142805873081583 + 0.0267004219364679i 0.000314527358302778 + 0.0215255889620223i 0.00173512964620177 + 0.00865151513638104i 0.0169666351363477 + 0.00836162056544561i 0.0255915540012784 + 0.0277878383595920i 0.0118710562486680 + 0.0506446948330055i -0.0160165379892836 + 0.0553846122152651i -0.0354343989166415 + 0.0406080858067314i -0.0370261047451452 + 0.0261077990289579i -0.0365120038155127 + 0.0268311542287801i -0.0541841640123775 + 0.0312446266697320i -0.0854132555297956 + 0.0125342802025550i -0.0989182320365535 - 0.0377079727602073i -0.0686133217915410 - 0.0925138855355046i -0.00474198249025186 - 0.111728716441247i 0.0515933837210975 - 0.0814138940625859i 0.0663201317560107 - 0.0279433757588921i 0.0426055814586485 + 0.00821080477569232i
OpenCV after cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
Channel 1:
0.322008 -0.197121 -0.482671 -0.300055 -0.026996 -0.003475 -0.104199 -0.017810 0.244606 0.342909 0.108642 -0.152477 -0.013281 0.494806 0.854412 0.688818 0.276848 0.267571 0.889207 1.620622 1.772298 1.299452 0.835450 0.858602 0.999833 0.401098 -1.206658 -2.960446 -3.575316 -2.605239 -0.894184 0.262747
Channel 2:
0.403275 0.089205 0.373494 0.655387 0.598925 0.423432 0.448903 0.609397 0.583616 0.308737 0.129670 0.345907 0.756820 0.851827 0.456976 0.010063 0.055522 0.542928 0.818924 0.379870 -0.512527 -1.133893 -1.184826 -1.168379 -1.733893 -2.733226 -3.165383 -2.195622 -0.151738 1.650990 2.122242 1.363375
What am I missing? Shouldn't the results be similar? How can I check if the inverse transform in opencv is done correctly?
Later EDIT:
After struggling with the problems for a few hours now I've decided to plot the results from Matlab and OpenCV and to my surprise they were very much similar.
Imaginary parts
Real parts:
So obviously it's something about a SCALE factor. After dividing them element by element apparently this factor is 32 - the length of the signal. Can someone explain why this happens?
The obvious solution is to use cv::dft(x_fft, x_filt, cv::DFT_INVERSE+cv::DFT_SCALE, 0); so I guess this topic is answered but I'm still interested in why is it this way.
There is no standard for scale factor used by all FFT libraries. Some use none, some include a scale factor of 1/N, some 1/sqrt(N). You have to test or look in the documentation for each particular library.

Resources