Why doesn't FFT automatically produce a zero-frequency centered output? - image-processing

There is an operation called Shift which is performed after DFT to bring zero-frequency components to the center of the frequency spectrum.
I have two questions regarding this operation:
Why don't/can't DFT automatically center the zero-frequency components?
What happens if we don't perform Shift operation after DFT? I.e. how does it affect our other tasks of image processing?
Can anyone provide me some material to know about this specific operation named Shift?
- fftw shift zero-frequency to the image center
- Why we shift the zero-frequency component to the center of the spectrum?

The DFT, by definition, uses n=0..N-1 and k=0..N-1, where n is the index into the time-domain signal and k the index into the frequency-domain signal. k also corresponds to the frequency. The DFT is defined this way in analogy to Fourier series.
Since the frequency in the DFT is periodic, one can think of k=N-1 to correspond to k=-1 instead. The shift function thus moves the upper half of the frequencies to the left of the origin, so they can be more readily interpreted as negative frequencies. But this is solely a convenience for display, as it brings the frequency-domain signal to a form that we're more familiar with (this is probably because it makes some Fourier analysis easier to explain, and hence text books display it this way, and hence we learn about Fourier by looking at Frequency plots with the origin in the middle).
For most tasks in image processing, we do not need to shift the origin. Again, it is only for display that this is convenient and pretty.
For example, to compute cross-correlation:
cc = ifft( fft(img1) * conj(fft(img2)) )
Here, the top-left pixel of cc is the origin. If img1==img2, the top-left pixel will be the maximum value. If we had an fft function that shifted the origin to the middle, then the cross-correlation image cc would have its origin in the middle also. After finding the peak, we'd need to do some computations to figure out what the shift between img1 and img2 is. (Not that this is complicated, but it shows that shifting is not necessarily advantageous.)
When convolving, one often has a spatial-domain kernel with the origin in the middle (as for example in this recent question). In this case, one must shift the origin to the top-left before computing the DFT. But there is no point in shifting the origin of the frequency-domain signals just to multiply them together and then undo the shift. One can simply directly multiply the signals that have the origin in the top-left corner:
kernel = ifftshift(kernel)
filtered = ifft( fft(img) * fft(kernel) )
Note that there are two different shift functions, often called fftshift and ifftshift. The one shifts the origin from the top-left to the middle, the other shifts the origin from the middle to the top-left. These two functions do exactly the same thing for even-sized signals (images), but differ if the sizes are odd.


Finding vertexes for construction of minimum size bounding box / convex hull

I have an array of data from a grayscale image that I have segmented sets of contiguous points of a certain intensity value from.
Currently I am doing a naive bounding box routine where I find the minimum and maximum (x,y) [row, col] points. This obviously does not provide the smallest possible box that contains the set of points which is demonstrable by simply rotating a rectangle so the longest axis is no longer aligned with a principal axis.
What I wish to do is find the minimum sized oriented bounding box. This seems to be possible using an algorithm known as rotating calipers, however the implementations of this algorithm seem to rely on the idea that you have a set of vertices to begin with. Some details on this algorithm: https://www.geometrictools.com/Documentation/MinimumAreaRectangle.pdf
My main issue is in finding the vertices within the data that I currently have. I believe I need to at least find candidate vertices in order to reduce the amount of iterations I am performing, since the amount of points is relatively large and treating the interior points as if they are vertices is unnecessary if I can figure out a way to not include them.
Here is some example data that I am working with:
Here's the segmented scene using the naive algorithm, where it segments out the central objects relatively well due to the objects mostly being aligned with the image axes:
In red, you can see the current bounding boxes that I am drawing utilizing 2 vertices: top-left and bottom-right corners of the groups of points I have found.
The rotation part is where my current approach fails, as I am only defining the bounding box using two points, anything that is rotated and not axis-aligned will occupy much more area than necessary to encapsulate the points.
Here's an example with rotated objects in the scene:
Here's the current naive segmentation's performance on that scene, which is drawing larger than necessary boxes around the rotated objects:
Ideally the result would be bounding boxes aligned with the longest axis of the points that are being segmented, which is what I am having trouble implementing.
Here's an image roughly showing what I am really looking to accomplish:
You can also notice unnecessary segmentation done in the image around the borders as well as some small segments, which should be removed with some further heuristics that I have yet to develop. I would also be open to alternative segmentation algorithm suggestions that provide a more robust detection of the objects I am interested in.
I am not sure if this question will be completely clear, therefore I will try my best to clarify if it is not obvious what I am asking.
It's late, but that might still help. This is what you need to do:
expand pixels to make small segments connect larger bodies
find connected bodies
select a sample of pixels from each body
find the MBR ([oriented] minimum bounding rectangle) for selected set
For first step you can perform dilation. It's somehow like DBSCAN clustering. For step 3 you can simply select random pixels from a uniform distribution. Obviously the more pixels you keep, the more accurate the MBR will be. I tested this in MATLAB:
% import image as a matrix of 0s and 1s
oI = ~im2bw(rgb2gray(imread('vSb2r.png'))); % original image
% expand pixels
dI = imdilate(oI,strel('disk',4)); % dilated
% find connected bodies of pixels
CC = bwconncomp(dI);
L = labelmatrix(CC) .* uint8(oI); % labeled
% mark some random pixels
rI = rand(size(oI))<0.3;
sI = L.* uint8(rI) .* uint8(oI); % sampled
% find MBR for a set of connected pixels
for i=1:CC.NumObjects
[Y,X] = find(sI == i);
mbr(i) = getMBR( X, Y );
You can also remove some ineffective pixels using some more processing and morphological operations:
remove holes
find boundaries
find skeleton
I = imfill(I, 'holes');
I = bwmorph(I,'remove');
I = bwmorph(I,'skel');

Is canny edge detection edge rotationlly invariant?

Suppose that the Canny edge detector successfully detects an edge in an image. The edge is then rotated by θ, where the relationship between a point on the original edge (x,y)(x,y) and a point on the rotated edge (x′,y′)(x′,y′) is defined as x′ = xcosθ; y′ = xsinθ;
Will the rotated edge be detected using the same Canny edge detector?
(I think we should find answer considering that the detection of an edge by the Canny edge detector depends only on the magnitude of its derivative.)
The answer is both yes and no, and which one you go for depends on how literally you take the question.
First of all, we're dealing with a rectangular grid, so given an integer location (x,y), the corresponding point (x',y') in a rotated image is highly likely not an integer location. And considering that the output of Canny is a set of points, and not a smooth function that can be interpolated, it would be difficult to establish a correspondence between the set resulting from the rotated and the one resulting from the original image.
Think for example about the number of pixels on a discrete line of a given length at 0 degrees and at 45 degrees. (Hint: the line at 45 degrees has sqrt(2) times fewer pixels.)
But if you take the question more generally and interpret it as "will an edge that is detected in the original image also be detected after rotating the image by θ degrees?" then the answer is yes, in theory.
Of course practice is always a bit different than theory. The details of the implementation matter here. And there is always numerical imprecision to contend with.
Let's start by assuming the rotation is computed correctly, with a precise interpolation scheme (cubic, Lanczos) and not rounded after to uint8 or something (i.e. we're computing using floating-point values).
If you read the original paper by Canny, you'll see he proposes using Gaussian derivatives as the best compromise between compact support and computational precision. I have seen few implementations that actually do. Typically I see a convolution with a Gaussian and then Sobel derivatives. Especially for smaller sigmas (less smoothing) the difference can be quite large. Gaussian derivatives are rotationally invariant, Sobel derivatives are not.
The next step in the algorithm is non-maximum suppression. This is where the continuous gradient is converted to a set of points. For each pixel, it checks to see if it is a local maximum in the direction of the gradient. Because this is done per pixel, a different set of locations are tested in the rotated image compared to the original. Nonetheless, it should detect points along the same ridges in both cases.
Next, a hysteresis threshold is applied. This is a two-threshold operation that keeps pixels above one threshold as long as at least one pixel above a second threshold is present in the same connected component. This is where the differences could occur between rotated and original image. Remember we're dealing with a set of pixels. We have samples the continuous gradient function at discrete points. There could be an edge that has one pixel above the second threshold in one version of the image, but not in the other. This would only occur for edges very close to the chosen threshold, of course.
Next comes a thinning. Because the non-maximum suppression can yield points along a thicker line, a thinning operation is applied that removes pixels from the set that are not needed to maintain connectivity of the lines. Which pixels are selected here will also differ between rotated and original images, but this does not change the geometry of the solution, so we still have the same set of points.
So, the answer is yes and no. :)
Note that the same logic applies to translation.

Finding edges in a height map

I want to find sharp edges in a heightmap image, while ignoring shallow edges.
OpenCV offers multiple approaches to finding edges in a 2d Image: Canny, Sobel, etc.
However, all these approaches work by comparing the intensity values on both sides of the edge.
If the 2D Image represents a height map of a 3D object, then this results in some weird behaviour.
In a height map, the height of a 3D object at a given X/Y coordinate is represented as the intensity of the 2D Pixel at that X/Y coordinate:
In the above picture, at the edge B the intensity changes only slightly between the left and right side, even though it is a sharp corner.
At the edge A, there is a bigchange in the intensity between pixels on the left side of the edge and the right, even though it is only a shallow angle.
So there is no threshold for Canny or Sobel that will preserve the sharp edge but filter the shallow edge.
(In the above example, the edge B has one side with an ascending slope, and one side with a descending slope. I could filter for this feature; but that would remove the edges C and D as well)
How can I get a binary edge image, containing only edges above a certain angle? (e.g. edge B, C, and D, but not A)
Or alternatively, how can I get a gradient derivative image, where the intensity of each pixel is proportional to the angle of the edge at that pixel?
Probably you'll want to use second derivative instead of first for this task.
Here's my intuition: taking derivative of height (intensity in your case) at each position on an evenly spaced grid would be proportional to arctan of the surface slope between sampling points (or at sampling points if you use a 2-sided derivative approximation). But since you want to detect sharp edges - you are looking for a derivative of slope at the sampling points. This means that you can set a threshold on a derivative of arctan of derivative of intensity to achieve your goal (luckily there's no "need to go deeper" :) )
You will have to be extra careful with taking a derivative of "slope angles" that you'll get - depending on the coordinate system you may come across ambiguity of angle difference (there are 2 ways to get from one angle to another, which are different in general case; you're looking for the "shorter" one). You can look for possible solution here
I have a rather simple approach that I came across wile reading a blog post.
It involves computing the median value of the gray scale image. Using this value we can now set two threshold values:
lower: max(0, (1.0 - 0.33) * v)
upper: min(255, (1.0 + 0.33) * v)
Now pass these two values as parameters into the cv2.Canny() function.
You will now be able to perform an optimized edge detection given any image. The crux of this answer depends on the median value of the image which varies for different images.
If i understand your question correctly, "what you need is basically a corner with high intensity values".
If that is so then look for Harris corner detector which would help you to find points with high gradient change in both direction.
Once you detect the corners you can filter the corners which have high intensity by using a suitable threshold.

Documentation of CvStereoBMState for disparity calculation with cv::StereoBM

The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.
speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).
The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
Also check out this ROS tutorial on choosing stereo parameters.

opencv SimpleBlobDetector filterByInertia meaning?

I don't understand what filterByInertia means... neither do I understand the documentation's little description :
By ratio of the minimum inertia to maximum inertia. Extracted blobs will have this ratio between minInertiaRatio (inclusive) and maxInertiaRatio (exclusive).
. The above image pretty much explains what the different filter parameters do. SimpleBlobDetector is happiest when it sees a circular blob, and different filters filter out different kids of deviations from the circular shape.
Inertia measures the the ratio of the minor and major axes of a blob.
The figure also shows the difference between circularity and inertia. I have copied this figure from Blob Detection Tutorial at LearnOpenCV.com
I've been wondering this for a while also; the OpenCV documentation isn't very helpful when it comes to blob detection.
Based on the descriptions of other blob analyzers, the inertia of a blob is "the inertial resistance of the blob to rotation about its principal axes". It depends on how the mass of the blob (I guess in this case the area) is distributed throughout the blob's shape.
There's a lot of mathy stuff involved -- most of which I don't remember how to do -- but the result at the bottom of this page on the properties of binary images sums it up fairly well (blob detection is done by converting the input image to a series of binary images):
The ratio gives us some idea of how rounded the object is. This ratio will be 0 for a line and 1 for a circle.
So basically, by specifying minInertiaRatio and maxInertiaRatio you can filter the blobs based on how elongated they are. An inertia ratio of 0 will yield elongated blobs (closer to lines) and an inertia ratio of 1 will yield blobs where the area is more concentrated toward the center (closer to circles).
Here's a physical intepretation:
If you cut the blob out on a piece of card, you could find its center of gravity, and then attach an axle to it, crossing this point (the axle being parallel to the card), and then spin it, and measure its moment of inertia. Depending on the shape, you may get different values according to how you place the axle. For an ellipse, you get the lowest value when the axle is attached along the long (major) axis, and the largest when the the axle is placed along the short axis (so that more of the card is far from the axle). For a circle the inertia is always the same, of course.
If there are different values, there will be always be a 'max' inertia at some orientation, and a 'min' with the axle placed 90 degrees away from the 'max'. The inertia ratio is simply the ratio between these intertias, min/max.
For shapes which are not ellipses, the metric tells you whether the overall shape is roughly elongated, or roughly the same size in all directions; without caring in particular about an uneven boundary or cuts and concavities (which roundness and convexity look at).
Mathematically, it does something like this:
Consider the set of points within the blob to be a population of (x,y) samples
Find the mean of these, and the covariance matrix x vs. y
Find the two eigenvalues of the covariance matrix (which are the same as its singular values, due to the nature of this matrix)
The inertia ratio is the ratio between these two values, smallest/largest.
