Hi guys I’ve thinking about this question:
I know that we use Fourier transform to get into frequency domain to process the image.
I read the text book, it said that when we are done with processing the image in the Fourier domain we have to invert it back to get processed image.
And the textbook taught to get the real part of the inverse.
However, when I go through the OpenCv tutorial, no matter if using OpenCV or NumPy version, eventually they use magnitude (for OpenCV) or np.abs (for NumPy).
For OpenCV, the inverse returns two channels which contain the real and imaginary components. When I took the real part of the inverse, I got a totally weird image.
May somebody who knows the meaning behind all of this:
Why using magnitude or abs to get processed image?
What’s wrong with textbook instruction (take the real part of inverse)?
The textbook is right, the tutorial is wrong.
A real-valued image has a complex conjugate symmetry in the Fourier domain. This means that the FFT of the image will have a specific symmetry. Any processing that you do must preserve this symmetry if you want the inverse transform to remain real-valued. If you do this processing wrong, then the inverse transform will be complex-valued, and probably non-sensical.
If you preserve the symmetry in the Fourier domain properly, then the imaginary component of the inverse transform will be nearly zero (likely different from zero because of numerical imprecision). Discarding this imaginary component is the correct thing to do. Computing the magnitude will yield the same result, except all negative values will become positive (note some filters are meant to produce negative values, such as derivative filters), and at an increased computational cost.
For example, a convolution is a multiplication in the Fourier domain. The filter in the Fourier domain must be real-valued and symmetric around the origin. Often people will confuse where the origin is in the Fourier domain, and multiply by a filter that is seems symmetric, but actually is shifted with respect to the origin making it not symmetric. This shift introduces a phase change of the inverse transform (see the shift property of the Fourier transform). The magnitude of the inverse transform is not affected by the phase change, so taking the magnitude of this inverse transform yields an output that sort of looks OK, except if one expects to see negative values in the filter result. It would have been better to correctly understand the FFT algorithm, create a properly symmetric filter in the Fourier domain, and simply keep the real part of the inverse transform.
Nonetheless, some filters are specifically designed to break the symmetry and yield a complex-valued filter output. For example the Gabor filter has an even (symmetric) component and an odd (anti-symmetric) component. The even component yields a real-valued output, the odd component yields an imaginary-valued output. In this case, it is the magnitude of the complex value that is of interest. Likewise, a quadrature filter is specifically meant to produce a complex-valued output. From this output, the analytic signal (or its multi-dimensional extension, the monogenic signal), both the magnitude and the phase are of interest, for example as used in the phase congruency method of edge detection.
Looking at the linked tutorial, it is the line
fshift[crow-30:crow+30, ccol-30:ccol+30] = 0
which generates the Fourier-domain filter and applies it to the image (it is equivalent to multiplying by a filter with 1s and 0s). This tutorial correctly computes the origin of the Fourier domain (though for Python 3 you would use crow,ccol = rows//2 , cols//2 to get the integer division). But the filter above is not symmetric around that origin. In Python, crow-30:crow+30 indicates 30 pixels to the left of the origin, and only 29 pixels to the right (the right bound is not included!). The correct filter would be:
fshift[crow-30:crow+30+1, ccol-30:ccol+30+1] = 0
With this filter, the inverse transform is purely real (imaginary component has values in the order of 1e-13, which is numerical errors). Thus, it is now possible (and correct) to replace img_back = np.abs(img_back) with img_back = np.real(img_back).
Related
I have an image where I need to detect an object as fast as possible. I also know that I only need to detect the object closest to the center.
AFAIK Opencv's MatchTemplate works somewhat like this (pseudocode):
for(x in width):
for(y in height):
value = calcSimilarity(inputImage, searchedImage, x, y)
matched[x][y] = value
After that, I have to loop through the resulting image and find the point closest to the center, which is all quite a waste.
So I'm wondering if I can do something like:
coordsGen = new CoordsGen() // a class that generates specific coords for me
while(!coordsGen.stop):
x, y = coordsGen.next()
value = calcSimilarity(inputImage, searchedImage, x, y)
if(value > treshold)
return x, y
Basically what I need here is the calcSimilarity function. This would allow me to optimize the process greatly.
There are many choices of similarity scoring methods for template matching in general.*
OpenCV has 3 available template matching modes:
Sum of square differences (Euclidean distance)
Cross-correlation
Pearson correlation coefficient
And in OpenCV each of those three have normed/scaled versions as well:
Normalized sum of square differences
Normalized cross-correlation
Normalized Pearson correlation coefficient
You can see the actual formulas used in the OpenCV docs under TemplateMatchModes though these agree with the general formulas you can find everywhere for the above methods.
You can code the template matching yourself instead of using OpenCV. However, note that OpenCV is optimized for these operations and in general is blazing fast at template matching. OpenCV uses a DFT to perform some of these computations to reduce the computational load. For e.g., see:
Why is opencv's Template Matching ... so fast?
OpenCV Sum of squared differences speed
You can also use OpenCV's minMaxLoc() to find the min/maximum value instead of looping through yourself. Also, you didn't specify how you're accessing your values but not all lookup methods are as fast as others. See How to scan images to see the fastest Mat access operations. Spoiler: raw pointers.
The main speedup your optimization would look to give you is early termination of the function. However, I don't think you'll achieve faster times in general by coding it yourself, unless there's a significantly smaller subset of the original image that the template is usually in.
A better method to reduce search time if your images are very big would be to use a pyramid resolution approach. Basically, make template and search images 1/2 your image since, 1/2 of that, 1/2 of that, and so on. Then you start the template matching on a small 1/16 or whatever sized image and find the general location of the template. Then you do the same for the next image size up, but you only search a small subset around where your template was at the previous scale. Then each time you grow the image size closer to the original, you're only looking for small differences of a few pixels to nail down the position more accurately. The general location is first found with the smallest scaled image, which only takes a fraction of the time to find compared to the original image size, and then you simply refine it by scaling up.
* Note that OpenCV doesn't include other template matching methods which you may see elsewhere. In particular, OpenCV has a sum of square differences but no sum of absolute distances method. Phase differences are also used as a similarity metric, but don't exist in OpenCV. Either way, cross-correlation and sum of square differences are both extremely common in image processing and unless you have a special image domain, should work fine.
I used OpenCV's cv::findHomography API to calculate the homography matrix of two planar images.
The matched key points are extracted by SIFT and matched by BFMatcher. As I know, cv:findHomography use RANSAC iteration to find out the best four corresponding points to get the homography matrix.
So I draw the selected four pairs of points with the calculated contour using homograhy matrix of the edge of the object.
The result are as the links:
https://postimg.cc/image/5igwvfrx9/
As we can see, the selected matched points by RANSAC are correct, but the contour shows that the homography is not accurate.
But these test shows that, both the selected matched points and the homography are correct:
https://postimg.cc/image/dvjnvtm53/
My guess is that if the selected matched points are too close, the small error of the pixel position will lead to the significant error of the homography matrix. If the four points are in the corner of the image, then the shift of the matched points by 4-6 pixels still got good homography matrix.
(According the homogenous coordinate, I think it is reasonable, as the small error in the near plane will be amplified in the far away)
My question is:
1.Is my guess right?
2.Since the four matched points are generated by the RANSAC iteration, the overall error of all the keypoints are minimal. But How to get the stable homography, at least making the contour's mapping is correct? The theory proved that if the four corresponding points in a plane are found, the homography matrix should be calculated, but is there any trick in the engineer work?
I think you're right, and the proximity of the 4 points does not help the accuracy of the result. What you observe is maybe induced by numerical issues: the result may be locally correct for these 4 points but becomes worse when going further.
However, RANSAC will not help you here. The reason is simple: RANSAC is a robust estimation procedure that was designed to find the best point pairs among many correspondences (including some wrong ones). Then, in the inner loop of the RANSAC, a standard homography estimation is performed.
You can see RANSAC as a way to reject wrong point correspondences that would provoke a bad result.
Back to your problem:
What you really need is to have more points. In your examples, you use only 4 point correspondences, which is just enough to estimate an homography.
You will improve your result by providing more matches all over the target image. The problem then becomes over-determined, but a least squares solution can still be found by OpenCV. Furthermore, of there is some error either in the point correspondence process or in some point localization, RANSAC will be able to select the best ones and still give you a reliable result.
If RANSAC results in overfitting on some 4 points (as it seems to be the case in your example), try to relax the constraint by increasing the ransacReprojThreshold parameter.
Alternatively, you can either:
use a different estimator (the robust median CV_LMEDS is a good choice if there are few matching errors)
or use RANSAC in a first step with a large reprojection error (to get a rough estimate) in order to detect the spurious matchings then use LMEDS on the correct ones.
Just to extend #sansuiso's answer, with which I agree:
If you provide around 100 correspondences to RANSAC, probably you are getting more than 4 inliers from cvFindHomography. Check the status output parameter.
To obtain a good homography, you should have many more than 4 correspondences (note that 4 correspondences gives you an homography always), which are well distributed around the image and which are not linear. You can actually use a minimum number of inliers to decide whether the homography obtained is good enough.
Note that RANSAC finds a set of points that are consistent, but the way it has to say that that set is the best one (the reprojection error) is a bit limited. There is a RANSAC-like method, called MSAC, that uses a slightly different error measurement, check it out.
The bad news, in my experience, is that it is little likely to obtain a 100% precision homography most of the times. If you have several similar frames, it is possible that you see that homography changes a little between them.
There are tricks to improve this. For example, after obtaining a homography with RANSAC, you can use it to project your model into the image, and look for new correspondences, so you can find another homography that should be more accurate.
Your target has a lot of symmetric and similar elements. As other people mentioned (and you clarified later) the point spacing and point number can be a problem. Another problem is that SIFT is not designed to deal with significant perspective distortions that are present in your case. Try to track your object through smaller rotations and as was mentioned reproject it using the latest homography to make it look as close as possible to the original. This will also allow you to skip processing heavy SIFT and to use something as lightweight as FAST with cross correlation of image patches for matching.
You also may eventually come to understanding that using points is not enough. You have to use all that you got and this means lines or conics. If a homography transforms a point Pb = H* Pa it is easy to verify that in homogeneous coordinates line Lb = Henv.transposed * La. this directly follows from the equation La’.Pa = 0 = La’ * Hinv * H * Pa = La’ * Hinv * Pb = Lb’.Pb
The possible min. configurations is 1 line and three points or three lines and one point. Two lines and two points doesn’t work. You can use four lines or four points as well. Of course this means that you cannot use the openCV function anymore and has to write your own DLT and then non-linear optimization.
I'm using findHomography on a list of points and sending the result to warpPerspective.
The problem is that sometimes the result is complete garbage and the resulting image is represented by weird gray rectangles.
How can I detect when findHomography sends me bad results?
There are several sanity tests you can perform on the output. On top of my head:
Compute the determinant of the homography, and see if it's too close to zero for comfort.
Even better, compute its SVD, and verify that the ratio of the first-to-last singular
value is sane (not too high). Either result will tell you whether the matrix is close to
singular.
Compute the images of the image corners and of its center (i.e. the points you get when
you apply the homography to those corners and center), and verify that they make sense,
i.e. are they inside the image canvas (if you expect them to be)? Are they well separated
from each other?
Plot in matlab/octave the output (data) points you fitted the homography to, along
with their computed values from the input ones, using the homography, and verify that they
are close (i.e. the error is low).
A common mistake that leads to garbage results is incorrect ordering of the lists of input and output points, that leads the fitting routine to work using wrong correspondences. Check that your indices are correct.
Understanding the degenerate homography cases is the key. You cannot get a good homography if your points are collinear or close to collinear, for example. Also, huge gray squares may indicate extreme scaling. Both cases may arise from the fact that there are very few inliers in your final homography calculation or the mapping is wrong.
To ensure that this never happens:
1. Make sure that points are well spread in both images.
2. Make sure that there are at least 10-30 correspondences (4 is enough if noise is small).
3. Make sure that points are correctly matched and the transformation is a homography.
To find bad homographies apply found H to your original points and see the separation from your expected points that is |x2-H*x1| < Tdist, where Tdist is your threshold for distance error. If there are only few points that satisfy this threshold your homography may be bad and you probably violated one of the above mentioned requirements.
But this depends on the point-correspondences you use to compute the homography...
Just think that you are trying to find a transformation that maps lines to lines (from one plane to another), so not any possible configuration of point-correspondences will give you an homography that creates nice images.
It is even possible that the homography maps some of the points to the infinity.
I am having quite a bit of trouble understanding the workings of plane to plane homography. In particular I would like to know how the opencv method works.
Is it like ray tracing? How does a homogeneous coordinate differ from a scale*vector?
Everything I read talks like you already know what they're talking about, so it's hard to grasp!
Googling homography estimation returns this as the first link (at least to me):
http://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf. And definitely this is a poor description and a lot has been omitted. If you want to learn these concepts reading a good book like Multiple View Geometry in Computer Vision would be far better than reading some short articles. Often these short articles have several serious mistakes, so be careful.
In short, a cost function is defined and the parameters (the elements of the homography matrix) that minimize this cost function are the answer we are looking for. A meaningful cost function is geometric, that is, it has a geometric interpretation. For the homography case, we want to find H such that by transforming points from one image to the other the distance between all the points and their correspondences be minimum. This geometric function is nonlinear, that means: 1-an iterative method should be used to solve it, in general, 2-an initial starting point is required for the iterative method. Here, algebraic cost functions enter. These cost functions have no meaningful/geometric interpretation. Often designing them is more of an art, and for a problem usually you can find several algebraic cost functions with different properties. The benefit of algebraic costs is that they lead to linear optimization problems, hence a closed form solution for them exists (that is a one shot /non-iterative method). But the downside is that the found solution is not optimal. Therefore, the general approach is to first optimize an algebraic cost and then use the found solution as starting point for an iterative geometric optimization. Now if you google for these cost functions for homography you will find how usually these are defined.
In case you want to know what method is used in OpenCV simply need to have a look at the code:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L81
This is the algebraic function, DLT, defined in the mentioned book, if you google homography DLT should find some relevant documents. And then here:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L165
An iterative procedure minimizes the geometric cost function.It seems the Gauss-Newton method is implemented:
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
All the above discussion assumes you have correspondences between two images. If some points are matched to incorrect points in the other image, then you have got outliers, and the results of the mentioned methods would be completely off. Robust (against outliers) methods enter here. OpenCV gives you two options: 1.RANSAC 2.LMeDS. Google is your friend here.
Hope that helps.
To answer your question we need to address 4 different questions:
1. Define homography.
2. See what happens when noise or outliers are present.
3. Find an approximate solution.
4. Refine it.
Homography in a 3x3 matrix that maps 2D points. The mapping is linear in homogeneous coordinates: [x2, y2, 1]’ ~ H * [x1, y1, 1]’, where ‘ means transpose (to write column vectors as rows) and ~ means that the mapping is up to scale. It is easier to see in Cartesian coordinates (multiplying nominator and denominator by the same factor doesn’t change the result)
x2 = (h11*x1 + h12*y1 + h13)/(h31*x1 + h32*y1 + h33)
y2 = (h21*x1 + h22*y1 + h23)/(h31*x1 + h32*y1 + h33)
You can see that in Cartesian coordinates the mapping is non-linear, but for now just keep this in mind.
We can easily solve a former set of linear equations in Homogeneous coordinates using least squares linear algebra methods (see DLT - Direct Linear Transform) but this unfortunately only minimizes an algebraic error in homography parameters. People care more about another kind of error - namely the error that shifts points around in Cartesian coordinate systems. If there is no noise and no outliers two erros can be identical. However the presence of noise requires us to minimize the residuals in Cartesian coordinates (residuals are just squared differences between the left and right sides of Cartesian equations). On top of that, a presence of outliers requires us to use a Robust method such as RANSAC. It selects the best set of inliers and rejects a few outliers to make sure they don’t contaminate our solution.
Since RANSAC finds correct inliers by random trial and error method over many iterations we need a really fast way to compute homography and this would be a linear approximation that minimizes parameters' error (wrong metrics) but otherwise is close enough to the final solution (that minimizes squared point coordinate residuals - a right metrics). We use a linear solution as a guess for further non-linear optimization;
The final step is to use our initial guess (solution of linear system that minimized Homography parameters) in solving non-linear equations (that minimize a sum of squared pixel errors). The reason to use squared residuals instead of their absolute values, for example, is because in Gaussian formula (describes noise) we have a squared exponent exp(x-mu)^2, so (skipping some probability formulas) maximum likelihood solutions requires squared residuals.
In order to perform a non-linear optimization one typically employs a Levenberg-Marquardt method. But in the first approximation one can just use a gradient descent (note that gradient points uphill but we are looking for a minimum thus we go against it, hence a minus sign below). In a nutshell, we go through a set of iterations 1..t..N selecting homography parameters at iteration t as param(t) = param(t-1) - k * gradient, where gradient = d_cost/d_param.
Bonus material: to further minimize the noise in your homography you can try a few tricks: reduce a search space for points (start tracking your points); use different features (lines, conics, etc. that are also transformed by homography but possibly have a higher SNR); reject impossible homographs to speed up RANSAC (e.g. those that correspond to ‘impossible’ point movements); use low pass filter for small changes in Homographies that may be attributed to noise.
I am confused as to how to use the OpenCV findHomography method to compute the optimal transformation.
The way I use it is as follows:
cv::Mat h = cv::findHomography(src, dst, CV_RANSAC, 5.f);
No matter how many times I run it, I get the same transformation matrix. I thought RANSAC is supposed to randomly select a subset of points to do the fitting, so why does it return the same transformation matrix every time? Is it related to some random number initialization? How can I make this behaviour actually random?
Secondly, how can I tune the number of RANSAC iterations in this setup? Usually the number of iterations is based on inlier ratios and things like that.
I thought RANSAC is supposed to randomly select a subset of points to do the fitting, so why does it return the same transformation matrix every time?
RANSAC repeatedly selects a subset of points, then fits a model based upon them, then checks how many data points in the data set are inliers given that fitted model. Once it's done that lots of times, it picks the fitted model that had the most inliers, and refits the model to those inliers.
For any given data set, set of variable model parameters, and rule for what constitutes an inlier, there will exist one or more (but often exactly one) largest possible set of "inliers". For example, given this data set (image from Wikipedia):
... then with some sort of reasonable definition of an outlier, the maximal possible set of inliers any linear model can have is the one in blue below:
Let's call the set of blue points above - the maximal possible set of inliers - I.
If you randomly select a small number of points (e.g. two or three) and draw a line of best fit through them, it's hopefully intuitively obvious that it'll only take you a handful of tries until you hit an iteration where:
all the randomly-selected points you pick are from I, and so
the line of best fit through those points is roughly equal to the line of best fit in the graph above, and so
the set of inliers found on that iteration is exactly I
From that iteration onwards, all further iterations are a waste that cannot possibly improve the model further (although RANSAC has no way of knowing this, since it doesn't magically know when it's found the maximal set of inliers).
If you have a large enough number of iterations relative to the size of your data set, and a large enough proportion of the data set are inliers, then you will eventually find the maximal set of inliers with a close to 100% chance every time you run RANSAC. As a consequence, RANSAC will (almost) always output exactly the same model.
And that's a good thing! Often, you want RANSAC to find the absolute maximal set of inliers and don't want to settle for anything less. If you're getting different results each time you run RANSAC in such a scenario, that's a sign that you want to increase your number of iterations.
(Of course, in the case above we're talking about trying to fit a line through points in a 2D plane, which isn't what findHomography does, but the principle is the same; there will typically still be a single maximal set of inliers and eventually RANSAC will find it.)
How can I make this behaviour actually random?
Decrease the number of iterations (maxIters) so that RANSAC sometimes fails to find the maximal set of inliers.
But there's generally no reason to do this besides pure intellectual curiosity; you'll basically be deliberately telling RANSAC to output an inferior model.
findHomography will already give you the optimal transformation. The real question is about the meaning of optimal.
For example, with RANSAC you'll have the model with maximum number of inliers, while with LMEDS you'll have the model with minimum median error.
You can modify default behavior by:
changing the number of iteration of RANSAC by setting maxIters (max number allowed is 2000)
decreasing (increasing) the ransacReprojThreshold used to validate a inliers and outliers (usually between 1 and 10).
Regarding you questions.
No matter how many times I run it, I get the same transformation matrix.
Probably your points are good enough that you find always the optimal model.
I thought RANSAC is supposed to randomly select a subset of points to do the fitting
RANSAC (RANdom SAmple Consensus) first selects a random subset, the checks if the model built with these points is good enough. If not, it selects another random subset.
How can I make this behaviour actually random?
I can't imagine a scenario where this would be useful, but you can randomly select 4 couples of points from src and dst, and use getPerspectiveTransform. Unless your points are perfect, you'll get a different matrix for each subset.