My boss and I disagree as to what is going on with the CV_TM_CCORR_NORMED method for matchTemplate(); in openCV.
Can you please explain what is happening here especially the square root aspect of this equation.
Correlation is similarity of two signals,vectors etc. Suppose you have vectors
template=[0 1 0 0 1 0 ] A=[0 1 1 1 0 0] B =[ 1 0 0 0 0 1]
if you perform correlation between vectors and template to get which one is more similar ,you will see A is similar to template more than B because 1's are placed in corresponding indexes.This means the more nonzero elements corresponds the more correlation between vectors is.
In grayscale images the values are in the range of 0-255.Let's do that :
template=[10 250 36 30] A=[10 250 36 30] B=[220 251 240 210] .
Here it is clear that A is the same as template but correlation between B and template is bigger than A and template.In normalized cross correlation denumerator part of formula is solving this problem. If you check the formula below you can see that denumerator for B(x)template will be much bigger than A(x)template.
Formula as stated in opencv documentation :
In practice if you use cross correlation,if there is a brightness in a part of image , the correlation between that part and your template will be larger.But if you use normalized cross correlation you will get better result.
Think formula is this :
Before multiplying element by element you are normalizing two matrixes.By dividing root of square sum of all elements in matrix you are removing the gain;if all elements are large then divisor is large.
Think that you are dividing sum of all elements in matrix.If a pixel value is in a brighter area then its neighbours pixel values will be high.By dividing sum of its neighbourhood you are removing illumination effect.This is for image processing where pixel values are always positive.But for 2D matrix there may be some negative values so squaring ignores sign.
Gx = [-1 0 1
-2 0 2
-1 0 1]
Gy = [-1 -2 -1
0 0 0
1 2 1]
I knew these are the combination of smoothing filter and gradient but how are they combine to get this output ?
The Sobel Kernel is a convolution of the derivation kernel [-1 0 1] with a smoothing kernel [1 2 1]'. The former is straightforward, the later is rather arbitrary - you can see it as some sort of discrete implementation of a 1D Gaussian of a certain sigma if you want.
I think edge detection (ie gradient) influence is obvious - if there is a vertical edge. sobel operator Gx will definitely give big values relative to places where there is no edge because you just subtract two different values (intensity on the one side of an edge differs much from intensity on another side). The same thought on horizontal edges.
About smoothing, if you see e.g. mask for gaussian function for simga=1.0:
which actually does smoothing, you can catch an idea: we actaully set a pixel to a value associated to values of its neighbor. It means we 'average' values respectively to the pixel we are considering. In our case, Gx and Gy, it perforsm slightly smoothing in comparision to gaussian, but still idea remains the same.
I am taking this course on Neural networks in Coursera by Geoffrey Hinton (not current).
I have a very basic doubt on weight spaces.
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec2.pdf
Page 18.
If I have a weight vector (bias is 0) as [w1=1,w2=2] and training case as {1,2,-1} and {2,1,1}
where I guess {1,2} and {2,1} are the input vectors. How can it be represented geometrically?
I am unable to visualize it? Why is training case giving a plane which divides the weight space into 2? Could somebody explain this in a coordinate axes of 3 dimensions?
The following is the text from the ppt:
1.Weight-space has one dimension per weight.
2.A point in the space has particular setting for all the weights.
3.Assuming that we have eliminated the threshold each hyperplane could be represented as a hyperplane through the origin.
My doubt is in the third point above. Kindly help me understand.
It's probably easier to explain if you look deeper into the math. Basically what a single layer of a neural net is performing some function on your input vector transforming it into a different vector space.
You don't want to jump right into thinking of this in 3-dimensions. Start smaller, it's easy to make diagrams in 1-2 dimensions, and nearly impossible to draw anything worthwhile in 3 dimensions (unless you're a brilliant artist), and being able to sketch this stuff out is invaluable.
Let's take the simplest case, where you're taking in an input vector of length 2, you have a weight vector of dimension 2x1, which implies an output vector of length one (effectively a scalar)
In this case it's pretty easy to imagine that you've got something of the form:
input = [x, y]
weight = [a, b]
output = ax + by
If we assume that weight = [1, 3], we can see, and hopefully intuit that the response of our perceptron will be something like this:
With the behavior being largely unchanged for different values of the weight vector.
It's easy to imagine then, that if you're constraining your output to a binary space, there is a plane, maybe 0.5 units above the one shown above that constitutes your "decision boundary".
As you move into higher dimensions this becomes harder and harder to visualize, but if you imagine that that plane shown isn't merely a 2-d plane, but an n-d plane or a hyperplane, you can imagine that this same process happens.
Since actually creating the hyperplane requires either the input or output to be fixed, you can think of giving your perceptron a single training value as creating a "fixed" [x,y] value. This can be used to create a hyperplane. Sadly, this cannot be effectively be visualized as 4-d drawings are not really feasible in browser.
Hope that clears things up, let me know if you have more questions.
I have encountered this question on SO while preparing a large article on linear combinations (it's in Russian, https://habrahabr.ru/post/324736/). It has a section on the weight space and I would like to share some thoughts from it.
Let's take a simple case of linearly separable dataset with two classes, red and green:
The illustration above is in the dataspace X, where samples are represented by points and weight coefficients constitutes a line. It could be conveyed by the following formula:
w^T * x + b = 0
But we can rewrite it vice-versa making x component a vector-coefficient and w a vector-variable:
x^T * w + b = 0
because dot product is symmetrical. Now it could be visualized in the weight space the following way:
where red and green lines are the samples and blue point is the weight.
More possible weights are limited to the area below (shown in magenta):
which could be visualized in dataspace X as:
Hope it clarifies dataspace/weightspace correlation a bit. Feel free to ask questions, will be glad to explain in more detail.
The "decision boundary" for a single layer perceptron is a plane (hyper plane)
where n in the image is the weight vector w, in your case w={w1=1,w2=2}=(1,2) and the direction specifies which side is the right side. n is orthogonal (90 degrees) to the plane)
A plane always splits a space into 2 naturally (extend the plane to infinity in each direction)
you can also try to input different value into the perceptron and try to find where the response is zero (only on the decision boundary).
Recommend you read up on linear algebra to understand it better:
https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces
For a perceptron with 1 input & 1 output layer, there can only be 1 LINEAR hyperplane. And since there is no bias, the hyperplane won't be able to shift in an axis and so it will always share the same origin point. However, if there is a bias, they may not share a same point anymore.
I think the reason why a training case can be represented as a hyperplane because...
Let's say
[j,k] is the weight vector and
[m,n] is the training-input
training-output = jm + kn
Given that a training case in this perspective is fixed and the weights varies, the training-input (m, n) becomes the coefficient and the weights (j, k) become the variables.
Just as in any text book where z = ax + by is a plane,
training-output = jm + kn is also a plane defined by training-output, m, and n.
Equation of a plane passing through origin is written in the form:
ax+by+cz=0
If a=1,b=2,c=3;Equation of the plane can be written as:
x+2y+3z=0
So,in the XYZ plane,Equation: x+2y+3z=0
Now,in the weight space;every dimension will represent a weight.So,if the perceptron has 10 weights,Weight space will be 10 dimensional.
Equation of the perceptron: ax+by+cz<=0 ==> Class 0
ax+by+cz>0 ==> Class 1
In this case;a,b & c are the weights.x,y & z are the input features.
In the weight space;a,b & c are the variables(axis).
So,for every training example;for eg: (x,y,z)=(2,3,4);a hyperplane would be formed in the weight space whose equation would be:
2a+3b+4c=0
passing through the origin.
I hope,now,you understand it.
Consider we have 2 weights. So w = [w1, w2]. Suppose we have input x = [x1, x2] = [1, 2]. If you use the weight to do a prediction, you have z = w1*x1 + w2*x2 and prediction y = z > 0 ? 1 : 0.
Suppose the label for the input x is 1. Thus, we hope y = 1, and thus we want z = w1*x1 + w2*x2 > 0. Consider vector multiplication, z = (w ^ T)x. So we want (w ^ T)x > 0. The geometric interpretation of this expression is that the angle between w and x is less than 90 degree. For example, the green vector is a candidate for w that would give the correct prediction of 1 in this case. Actually, any vector that lies on the same side, with respect to the line of w1 + 2 * w2 = 0, as the green vector would give the correct solution. However, if it lies on the other side as the red vector does, then it would give the wrong answer.
However, suppose the label is 0. Then the case would just be the reverse.
The above case gives the intuition understand and just illustrates the 3 points in the lecture slide. The testing case x determines the plane, and depending on the label, the weight vector must lie on one particular side of the plane to give the correct answer.
I am new to image processing and had to do some edge detection. I understood that there are 2 types of detectors- Gaussian and Laplacian which look for maximas and zero crossings respectively. What I don't understand is how this is implemented by simply convolving the image with 2d kernels. I mean how does convolving equals finding maxima and zero crossing?
Laplacian zero crossing is a 2nd derivative operation, since the local maxima is equivalent with a zero crossing in a 2nd derivative. So it can be written as f_xx+f_yy. If we use a 1D vector to represent f_xx and f_yy, it is [-1 2 -1] (f(x+1,y)-2*f(x,y)+f(x-1,y)). Since the laplacian is f_xx + f_yy, it can be rephrased in a 2D kernal:
0 -1 0
-1 4 -1
0 -1 0
or if you consider the diagonal elements as well, it is:
-1 -1 -1
-1 8 -1
-1 -1 -1
On the other hand,Gaussian kernal as a low pass filter is used here for scaling. The scaling ratio is controlled by the sigma. This mainly enhances the edges with different widths. Basically the larger the sigma, the thicker edges are enhanced.
Combined Laplacian and Gaussian is mathematically equivalent with G_xx + G_yy where G is the Gaussian kernel. But usually people used Difference of Gaussian instead of Laplacian of Gaussian to reduce the computational cost.
The two operators for detecting and smoothing horizontal and vertical edges are shown below:
[-1 0 1]
[-2 0 2]
[-1 0 1]
and
[-1 -2 -1]
[ 0 0 0]
[ 1 2 1]
But after much Googling, I still have no idea where these operators come from. I would appreciate it if someone can show me how they are derived.
The formulation was proposed by Irwin Sobel a long time ago. I think about 1974. There is a great page on the subject here.
The main advantage of convolving the 9 pixels surrounding one at which gradients are to be detected is that this simple operator is really fast and can be done with shifts and adds in low-cost hardware.
They are not the greatest edge detectors in the world - Google Canny edge detectors for something better, but they are fast and suitable for a lot of simple applications.
So spatial filters, like the Sobel kernels, are applied by "sliding" the kernel over the image (this is called convolution). If we take this kernel:
[-1 0 1]
[-2 0 2]
[-1 0 1]
After applying the Sobel operator, each result pixel gets a:
high (positive) value if the pixels on the right side are bright and pixels on the left are dark
low (negative) value if the pixels on the right side are dark and pixels on the left are bright.
This is because in discrete 2D convolution, the result is the sum of each kernel value multiplied by the corresponding image pixel. Thus a vertical edge causes the value to have a large negative or positive value, depending on the direction of the edge gradient. We can then take the absolute value and scale to interval [0, 1], if we want to display the edges as white and don't care about the edge direction.
This works identically for the other kernel, except it finds horizontal edges.