Explain difference between opencv's template matching methods in non-mathematical way - opencv

I'm trying to use opencv to find some template in images. While opencv has several template matching methods, I have big trouble to understand the difference and when to use which by looking at their mathematic equization:
CV_TM_SQDIFF
CV_TM_SQDIFF_NORMED
CV_TM_CCORR
CV_TM_CCORR_NORMED
CV_TM_CCOEFF
Can someone explain the major difference between all these method in a non-mathematical way?

The general idea of template matching is to give each location in the target image I, a similarity measure, or score, for the given template T. The output of this process is the image R.
Each element in R is computed from the template, which spans over the ranges of x' and y', and a window in I of the same size.
Now, you have two windows and you want to know how similar they are:
CV_TM_SQDIFF - Sum of Square Differences (or SSD):
Simple euclidian distance (squared):
Take every pair of pixels and subtract
Square the difference
Sum all the squares
CV_TM_SQDIFF_NORMED - SSD Normed
This is rarely used in practice, but the normalization part is similar in the next methods.
The nominator term is same as above, but divided by a factor, computed from the
- square root of the product of:
sum of the template, squared
sum of the image window, squared
CV_TM_CCORR - Cross Correlation
Basically, this is a dot product:
Take every pair of pixels and multiply
Sum all products
CV_TM_CCOEFF - Cross Coefficient
Similar to Cross Correlation, but normalized with their Covariances (which I find hard to explain without math. But I would refer to
mathworld
or mathworks
for some examples

Related

Confusion in first and second order derivatives in image processing

In image processing, the Laplacian filter adds the two second order derivatives, one in x direction and the other in y direction.
However, I am confused when we use first order derivative filters. In that case, we don't add the two first order derivatives. Instead we use the magnitude of the two first order derivatives, that is the L2 norm of the gradient.
I want to know why we don't add these two first order derivatives like Laplacian when we use first order derivative filters. Thanks a lot.
The Laplacian is defined as the trace of the Hessian matrix. The Hessian matrix collects all second-order derivatives, which include also things like d^2/dxdy. The diagonal of the Hessian are the second derivative along each axis. Thus, the trace is their sum. [You should look into the determinant of the Hessian, it’s an interesting operator too.]
The gradient is a vector, composed of the partial derivative along each axis. Its magnitude (norm) is the square root of the sum of the square elements.
These things are different because they have a different meaning and a different purpose.

How we can transform image from plane to vector?

I'm new in Computer Vision, but I'm want to discover this domain.
Now I learn how to detect spatial-temporal interest points. To this, I've read this article of Ivan Laptev.
So, I stuck on transformation image from R2(plane) to R1(vector). (in this article paragraph 2.1 in the start):
In the spatial domain, we can model an image f(sp):R^2->R its linear scale-space representation (Witkin, 1983; Koenderink and van Doorn, 1992;
Lindeberg, 1994; Florack, 1997) 2
I don't understand, how we get 1(image from R^2, R)
Can somebody give good article about this, or explain by himself?
As I understand, we use convolution with Gaussian kernel to this. But, after convolution we get also image R^2.
If you model your image as a function f(x,y) you pass values in R^2 (one dimension for each, x and one for y). And you get a one dimensional output (scalar) for each pair of x and y, right? Just stupid math:-)
The paragraph just state that the function operates on a neighborhood in R^2 and returns a scalar. This is true for a Gaussian it takes a neighborhood around a point and returns a scalar which is a weighted sum of the pixels in the neighborhood as a function of there location in relation to the center of the neighborhood.

superpixels extracted via energy-driven sampling (SEEDS)

I am interested in superpixels extracted via energy-driven sampling (SEEDS) which is a method of image segmentation using superpixels. This is also what OpenCV uses to create superpixels. I am having troubles finding documentation behind the SEEDS algorithm. OpenCV gives a very general description which can be found here.
I am looking for a more in depth description on how SEEDS functions (either a general walk through or a mathematical explanation). Any links or thoughts concerning the algorithm would be much appreciated! I can't seem to find any good material. Thanks!
I will first go through some general links and resources and then try to describe the general idea of the algorithm.
SEEDS implementations:
You obviously already saw the documentation here. A usage example for OpenCV's SEEDS implementation can be found here: Itseez/opencv_contrib/modules/ximgproc/samples/seeds.cpp, and allows to adapt the number of superpixels, the number of levels and other parameters live - so after reading up on the idea behind SEEDS you should definitely try the example. The original implementation, as well as a revised implementation (part of my bachelor thesis), can be found on GitHub: davidstutz/superpixels-revisited/lib_seeds and davidstutz/seeds-revised. The implementations should be pretty comparable, though.
Publication and other resources:
The paper was released on arxiv: arxiv.org/abs/1309.3848. A somewhat shorter description (which may be easier to follow) is available on my website: davidstutz.de/efficient-high-quality-superpixels-seeds-revised. The provided algorithm description should be easy to follow and -- in the best case -- allow to implement SEEDS (see the "Algorithm" section of the article). A more precise description can also be found in my bachelor thesis, in particular in section 3.1.
General description:
Note that this description is based on both the above mentioned article and my bachelor thesis. Both offer a mathematically concise description.
Given an image of with width W and height H, SEEDS starts by grouping pixels into blocks of size w x h. These blocks are further arranged into groups of 2 x 2. This schemes is repeated for L levels (this is the number of levels parameter). So at level l, you have blocks of size
w*2^(l - 1) x h*2^(l - 1).
The number of superpixels is determined by the blocks at level L, i.e. letting w_L and h_L denote the width and height of the blocks at level L, the number of superpixels is
S = W/w_L * H/h_L
where we use integer divisions.
The initial superpixel segmentation which is now iteratively refined by exchanging blocks of pixels and individual pixels between neighboring superpixels. To this end, color histograms of the superpixels and all blocks are computed (the histograms are determined by the number of bins parameter in the implementation). This can be done efficiently by seeing that the histogram of a superpixel is just the sum of the histograms of the 2 x 2 blocks it consists of, and the histogram of one of these blocks is the sum of the histograms of the 2 x 2 underlying blocks (and so on). So let h_i be the histogram of a block of pixels belonging to superpixel j, and h_j the histogram of this superpixel. Then, the similarity of the block j to superpixel j is computed by the histogram intersection of h_i and h_j (see one of the above resources for the equation). Similarly, the similarity of a pixel and a superpixel is either the Euclidean distance of the pixel color to the superpixel mean color (this is the better performing option), or the probability of the pixel's color belonging to the superpixel (which is simply the normalized entry of the superpixel's histogram at the pixel's color). With this background, the algorithm can be summarized as follow:
initialize block hierarchy and the initial superpixel segmentation
for l = L - 1 to 1 // go through all levels
// for level l = L these are the initial superpixels
for each block in level l
initialize the color histogram of this block
// as described this is done using the histograms of the level below
// now we start exchanging blocks between superpixels
for l = L - 1 to 1
for each block at level l
if the block lies at the border to a superpixel it does not belong to
compute the histogram intersection with both superpixels
assign the block to the superpixel with the highest intersection
// now we exchange individual pixels between superpixels
for all pixels
if the pixel lies at the border to a superpixel it does not belong to
compute the Euclidean distance of the pixel to both superpixel's mean color
assign the pixel to the closest superpixel
In practice, the block updates and pixel updates are iterated more than ones (which is the number of iterations parameter), and often twice as many iterations per level are done (which is the double step parameter). In the original segmentation, the number of superpixels is computed from w, h, L and the image size. In OpenCV, using the above equations, w and h is computed from the desired number of superpixels and number of levels (which are determined by the corresponding parameters).
One parameter remains unclear: the prior tries to enforce smooth boundaries. In practice this is done by considering the 3 x 3 neighborhood around a pixel which is going to be updated. If most of the pixels in this neighborhood belong to superpixel j, the pixel to be updated is also more likely to belong to superpixel j (and vice versa). OpenCV's implementation as well as my implementation (SEEDS revised), allow to consider larger neighborhoods k x k with k in {0,...,5} in the case of OpenCV.

direction on image pattern description and representation

I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.

K Nearest Neighbor classifier

I have set of about 200 points (x,y) of an image. The 200 data belong to 11 classes (which I think will become the class labels). My problem is how do I represent the x, y values as one data?
My first thought is that I should represent them separately with the labels and then when I get a point to classify, I will classify x and y separately. Something in me tells me that this is incorrect.
Please advice me how to present the x,y value as one data element.
I can't imagine what problem you meet. In kNN algorithm, we can use variables with multiple dimensions, you just need to use list in python standard library or array in Numpy library to organize the data such as : group = numpy.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
or group = [[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]] to represent (1.0,1.1) (1.0,1.0) (0,0) (0,0.1).
However, I suggest to use numpy, as there're so many functions in it and they are implemented by C language which ensure the efficiency of programs.
If you use numpy, you'd better to do all the operations in matrix way, for example, you can use point=tile([0,0],(3,1))anddistance(group-point)(distance is a function written by me) to calculate the distance without iteration.
The key is not representation but distance calculation instead. The points in your case are essentially one element but with two dimensions (x, y). The kNN algorithm can handle n-dimension case itself: it finds the k-nearest neighbors. So you can use the euclidean distance d((x1, y1), (x2, y2))=((x1-x2)^2 + (y1-y2)^2)^0.5, where (x1, y1) represents the first point to calculate, as the distances of points in your case.

Resources