Fast calculation of sum of absolute difference with Integral Image

Fast calculation of sum of absolute difference with Integral Image - image-processing

I am trying to find the fast way of block matching between 2 images and I am using sum of absolute difference (SAD) as similarity metric.
Given a reference block A at NxN size in frame 1, a full search for the best matched block within a search window of size (2W+N)*(2W+N) in frame 2, where W stands for the maximum allowed displacement. Full search block matching requires (2W+1)^2 comparison.
Integral image can calculate the sum of pixels in a block in any image quickly but I could not work out how to calculate the sum of absolute difference (SAD) here.
Is it possible to use the integral image to calculate SAD here? Thanks.

Related

Explain difference between opencv's template matching methods in non-mathematical way

I'm trying to use opencv to find some template in images. While opencv has several template matching methods, I have big trouble to understand the difference and when to use which by looking at their mathematic equization:
CV_TM_SQDIFF
CV_TM_SQDIFF_NORMED
CV_TM_CCORR
CV_TM_CCORR_NORMED
CV_TM_CCOEFF
Can someone explain the major difference between all these method in a non-mathematical way?

The general idea of template matching is to give each location in the target image I, a similarity measure, or score, for the given template T. The output of this process is the image R.
Each element in R is computed from the template, which spans over the ranges of x' and y', and a window in I of the same size.
Now, you have two windows and you want to know how similar they are:
CV_TM_SQDIFF - Sum of Square Differences (or SSD):
Simple euclidian distance (squared):
Take every pair of pixels and subtract
Square the difference
Sum all the squares
CV_TM_SQDIFF_NORMED - SSD Normed
This is rarely used in practice, but the normalization part is similar in the next methods.
The nominator term is same as above, but divided by a factor, computed from the
- square root of the product of:
sum of the template, squared
sum of the image window, squared
CV_TM_CCORR - Cross Correlation
Basically, this is a dot product:
Take every pair of pixels and multiply
Sum all products
CV_TM_CCOEFF - Cross Coefficient
Similar to Cross Correlation, but normalized with their Covariances (which I find hard to explain without math. But I would refer to
mathworld
or mathworks
for some examples

Find High Frequencies with Discrete Fourier Transform [OpenCV]

I want to determine image sharpness by the amount of high frequencies within the image. As far as I understand the dft() function from OpenCV returns two matrices with real and complex numbers.
This is where I am stuck. How can I determine the amount of high frequencies from this data?
I am thankful for every hint/link which could provide me with a better understanding.
Greetings

Make FT
Calculate magnitude of result
Now you have 2D matrix. Consider upper left quadrant (other are mirrors for real source).
Here Magn[0][0] entry corresponds to zero frequency, and Magn[(n-1)/2][(n-1)/2] entry corresponds to the highest frequency.
Left upper part of this submatrix contains low-frequency samples, so you can calculate sum of values in this part and in the rest part and compare these sums. For example (pseudocode):
cvIntegral(Magn, Rect(0..n/4, 0..n/4)) compare with
cvIntegral(Magn, Rect(0..n/2, 0..n/2)) - cvIntegral(Magn, Rect(0..n/4, 0..n/4))

1D discrete denoising of image by variational method (the length of smoothing term)

As of speaking about this 1D discrete denoising via variational calculus I would like to know how to manipulate the length of smoothing term as long as it should be N-1, while the length of data term is N. Here the equation:
E=0;
for i=1:n
E+=(u(i)-f(i))^2 + lambda*(u[i+1]-n[i])
E is the cost of actual u in optimization process
f is given image (noised)
u is output image (denoised)
n is the length of 1D vector.
lambda>=0 is weight of smoothness in optimization process (described around 13 minute in video)
here the length of second term and first term mismatch. How to resolve this?
More importantly, I would like to use linear equation system to solve this problem.

This is nowhere near my cup of tea but I think you are referring to the fact that:
u[i+1]-n[i] is accessing the next pixel making the term work only on resolution 1 pixel smaller then original f image
In graphics and filtering is this usually resolved in 2 ways:
use default value for pixels outside image resolution
you can set default or neutral(for the process) color to those pixels (like black)
use color of the closest neighbor inside image resolution
interpolate the mising pixels (bilinear,bicubic...)
I think the first choice is not suitable for your denoising technique.
change the resolution of output image
Usually after some filtering techniques (via FIR,etc) the result is 1 pixel smaller then the input to resolve the missing data problem. In your case it looks like your resulting u image should be 1 pixel bigger then input image f while computing cost functions.
So either enlarge it via bullet #1 and when the optimization is done you can crop back to original size.
Or virtually crop the f one pixel down (just say n'=n-1) before computing cost function so you avoid access violations (and also you can restore back after the optimization...)

Documentation of CvStereoBMState for disparity calculation with cv::StereoBM

The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.

speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).

The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
blob.
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
uniqueness_ratio:
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
these.
Also check out this ROS tutorial on choosing stereo parameters.

Comparison metric for two open contours

I'm validating an image segmentation algorithm applied to 2D images. The algorithm generates a contour segment, i.e. a set of connected pixels that form a freecurve in 2D space. The idea is to compare this set of pixels with a ground-truth, in my case another contour segment manually traced by an expert. An image showing what would be a segmentation result and the corresponding manual (ground-truth) segmentation is shown below:
I'm trying to think of an adequate comparison metric to validate the segmentation results. Ideally the best metric would be the point-to-point euclidean distance between corresponding pairs of pixels on each segment, however (as seen in previous figure) the segments don't have the same length (i.e. differ by the total number of pixels) so pixel-to-pixel comparisons have to be discarded.
Can you suggest me an adequate metric for validating my algorithm? Thanks for any suggestion!

For each pixel in the ground truth, take the distance to the nearest pixel in the segmentation result. Then take the sum of that for all ground truth pixels as the total error.
That's basically recall weighted by distance. If you start with the pixels in the result, it would resemble precision instead.

If the curves are closed, you can compute the area between the curves. If you can tell which pixels belong to a segment, that is as easy as computing XOR set of the 2 pixel sets.
Here is an example using that I've created using Matlab:

You could divide each line into n segments of equal length, then compute the euclidean distance between each segment and its pair on the other line.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart