I am new to image processing and had to do some edge detection. I understood that there are 2 types of detectors- Gaussian and Laplacian which look for maximas and zero crossings respectively. What I don't understand is how this is implemented by simply convolving the image with 2d kernels. I mean how does convolving equals finding maxima and zero crossing?
Laplacian zero crossing is a 2nd derivative operation, since the local maxima is equivalent with a zero crossing in a 2nd derivative. So it can be written as f_xx+f_yy. If we use a 1D vector to represent f_xx and f_yy, it is [-1 2 -1] (f(x+1,y)-2*f(x,y)+f(x-1,y)). Since the laplacian is f_xx + f_yy, it can be rephrased in a 2D kernal:
0 -1 0
-1 4 -1
0 -1 0
or if you consider the diagonal elements as well, it is:
-1 -1 -1
-1 8 -1
-1 -1 -1
On the other hand,Gaussian kernal as a low pass filter is used here for scaling. The scaling ratio is controlled by the sigma. This mainly enhances the edges with different widths. Basically the larger the sigma, the thicker edges are enhanced.
Combined Laplacian and Gaussian is mathematically equivalent with G_xx + G_yy where G is the Gaussian kernel. But usually people used Difference of Gaussian instead of Laplacian of Gaussian to reduce the computational cost.
Related
I have a 96x96 pixel grayscale facial images. i am trying to find the eye centers and lip corners. I applied one gabor filter (theta=pi/2, lamda=1.50) on the facial image and after convolving i get the filter output like this.
As you can see from the gabor output eyes and mouth corners are clearly distinguishable. i apply scikit kmeans clustering to group pixels together to 4 clusters (2 eyes and 2 lip corner)
data = output.reshape(-1,96*96)
estimator = KMeans(n_clusters=4)
estimator.fit(data)
centroids = np.asarray(estimator.cluster_centers_)
print 'Cluster centers', centroids.shape
print 'Labels', estimator.labels_, estimator.labels_.shape
Output
Input X,y: (100, 96, 96) (1783, 1)
Gabor Filters (1, 9, 9)
Final output X,y (100, 96, 96) (0,)
Shape estimator.cluster_centers_: (4, 9216)
Now comes the question: How do i plot the centroids x,y coordinates of the 4 cluster centers? Will i see the eye centers and mouth corners
Further information: I plot the estimator.cluster_centers_ and the output is like a code book. i see no coordinates of cluster centroids.
I am using the steps as described in this paper: http://jyxy.tju.edu.cn/Precision/MOEMS/doc/p36.pdf
I think there's some confusion here about the space in which you're doing your K-means clustering. In the code snippet that you included in your question, you're training up a KMeans model using the vectorized face images as data points. K-means clusters live in the same space as the data you give it, so (as you noticed) your cluster centroids will also be vectorized face images. Importantly, these face images have dimension 9216, not dimension 2 (i.e., x-y coordinates)!
To get 2-dimensional (x, y) coordinates as your K-means centroids, you need to run the algorithm using 2-dimensional input data. Just off the top of my head, it seems like you could apply a darkness threshold to your face images and assemble a clustering dataset of only the dark pixel locations. Then after you run K-means on this dataset, the centroids will hopefully be close to the pixel locations in your face images where there are the most dark pixels. These locations (assuming the face images in your training data were already somewhat registered) should be somewhat close to the eyes and mouth corners that you're hoping for.
This can be really confusing so I'll try to add an example. Let's say just for an example you have "face images" that are 3 pixels wide by 4 pixels tall. After thresholding the pixels in one of your images it might look like:
0 1 2 <-- x coordinates
0 0 0 0 ^ y coordinates
0 1 0 1 |
1 0 0 2 |
0 0 1 3 v
If you use this "image" directly in K-means, you're really running your K-means algorithm in a 12-dimensional space, and the image above would be vectorized as:
0 0 0 0 1 0 1 0 0 0 0 1
Then your K-means cluster centroids will also live in this same 12-dimensional space.
What I'm trying to suggest is that you could extract the (x, y) coordinates of the 1s in each image, and use those as the data for your K-means algorithm. So for the example image above, you'd get the following data points:
1 1
0 2
2 3
In this example, we've extracted 3 2-dimensional points from this one "image"; with more images you'd get more 2-dimensional points. After you run K-means with these 2-dimensional data points, you'll get cluster centroids that can also be interpreted as pixel locations in the original images. You could plot those centroid locations on top of your images and see where they correspond in the images.
My boss and I disagree as to what is going on with the CV_TM_CCORR_NORMED method for matchTemplate(); in openCV.
Can you please explain what is happening here especially the square root aspect of this equation.
Correlation is similarity of two signals,vectors etc. Suppose you have vectors
template=[0 1 0 0 1 0 ] A=[0 1 1 1 0 0] B =[ 1 0 0 0 0 1]
if you perform correlation between vectors and template to get which one is more similar ,you will see A is similar to template more than B because 1's are placed in corresponding indexes.This means the more nonzero elements corresponds the more correlation between vectors is.
In grayscale images the values are in the range of 0-255.Let's do that :
template=[10 250 36 30] A=[10 250 36 30] B=[220 251 240 210] .
Here it is clear that A is the same as template but correlation between B and template is bigger than A and template.In normalized cross correlation denumerator part of formula is solving this problem. If you check the formula below you can see that denumerator for B(x)template will be much bigger than A(x)template.
Formula as stated in opencv documentation :
In practice if you use cross correlation,if there is a brightness in a part of image , the correlation between that part and your template will be larger.But if you use normalized cross correlation you will get better result.
Think formula is this :
Before multiplying element by element you are normalizing two matrixes.By dividing root of square sum of all elements in matrix you are removing the gain;if all elements are large then divisor is large.
Think that you are dividing sum of all elements in matrix.If a pixel value is in a brighter area then its neighbours pixel values will be high.By dividing sum of its neighbourhood you are removing illumination effect.This is for image processing where pixel values are always positive.But for 2D matrix there may be some negative values so squaring ignores sign.
Gx = [-1 0 1
-2 0 2
-1 0 1]
Gy = [-1 -2 -1
0 0 0
1 2 1]
I knew these are the combination of smoothing filter and gradient but how are they combine to get this output ?
The Sobel Kernel is a convolution of the derivation kernel [-1 0 1] with a smoothing kernel [1 2 1]'. The former is straightforward, the later is rather arbitrary - you can see it as some sort of discrete implementation of a 1D Gaussian of a certain sigma if you want.
I think edge detection (ie gradient) influence is obvious - if there is a vertical edge. sobel operator Gx will definitely give big values relative to places where there is no edge because you just subtract two different values (intensity on the one side of an edge differs much from intensity on another side). The same thought on horizontal edges.
About smoothing, if you see e.g. mask for gaussian function for simga=1.0:
which actually does smoothing, you can catch an idea: we actaully set a pixel to a value associated to values of its neighbor. It means we 'average' values respectively to the pixel we are considering. In our case, Gx and Gy, it perforsm slightly smoothing in comparision to gaussian, but still idea remains the same.
The two operators for detecting and smoothing horizontal and vertical edges are shown below:
[-1 0 1]
[-2 0 2]
[-1 0 1]
and
[-1 -2 -1]
[ 0 0 0]
[ 1 2 1]
But after much Googling, I still have no idea where these operators come from. I would appreciate it if someone can show me how they are derived.
The formulation was proposed by Irwin Sobel a long time ago. I think about 1974. There is a great page on the subject here.
The main advantage of convolving the 9 pixels surrounding one at which gradients are to be detected is that this simple operator is really fast and can be done with shifts and adds in low-cost hardware.
They are not the greatest edge detectors in the world - Google Canny edge detectors for something better, but they are fast and suitable for a lot of simple applications.
So spatial filters, like the Sobel kernels, are applied by "sliding" the kernel over the image (this is called convolution). If we take this kernel:
[-1 0 1]
[-2 0 2]
[-1 0 1]
After applying the Sobel operator, each result pixel gets a:
high (positive) value if the pixels on the right side are bright and pixels on the left are dark
low (negative) value if the pixels on the right side are dark and pixels on the left are bright.
This is because in discrete 2D convolution, the result is the sum of each kernel value multiplied by the corresponding image pixel. Thus a vertical edge causes the value to have a large negative or positive value, depending on the direction of the edge gradient. We can then take the absolute value and scale to interval [0, 1], if we want to display the edges as white and don't care about the edge direction.
This works identically for the other kernel, except it finds horizontal edges.
For image derivative computation, Sobel operator looks this way:
[-1 0 1]
[-2 0 2]
[-1 0 1]
I don't quite understand 2 things about it,
1.Why the centre pixel is 0? Can't I just use an operator like below,
[-1 1]
[-1 1]
[-1 1]
2.Why the centre row is 2 times the other rows?
I googled my questions, didn't find any answer which can convince me. Please help me.
In computer vision, there's very often no perfect, universal way of doing something. Most often, we just try an operator, see its results and check whether they fit our needs. It's true for gradient computation too: Sobel operator is one of many ways of computing an image gradient, which has proved its usefulness in many usecases.
In fact, the simpler gradient operator we could think of is even simpler than the one you suggest above:
[-1 1]
Despite its simplicity, this operator has a first problem: when you use it, you compute the gradient between two positions and not at one position. If you apply it to 2 pixels (x,y) and (x+1,y), have you computed the gradient at position (x,y) or (x+1,y)? In fact, what you have computed is the gradient at position (x+0.5,y), and working with half pixels is not very handy. That's why we add a zero in the middle:
[-1 0 1]
Applying this one to pixels (x-1,y), (x,y) and (x+1,y) will clearly give you a gradient for the center pixel (x,y).
This one can also be seen as the convolution of two [-1 1] filters: [-1 1 0] that computes the gradient at position (x-0.5,y), at the left of the pixel, and [0 -1 1] that computes the gradient at the right of the pixel.
Now this filter still has another disadvantage: it's very sensitive to noise. That's why we decide not to apply it on a single row of pixels, but on 3 rows: this allows to get an average gradient on these 3 rows, that will soften possible noise:
[-1 0 1]
[-1 0 1]
[-1 0 1]
But this one tends to average things a little too much: when applied to one specific row, we lose much of what makes the detail of this specific row. To fix that, we want to give a little more weight to the center row, which will allow us to get rid of possible noise by taking into account what happens in the previous and next rows, but still keeping the specificity of that very row. That's what gives the Sobel filter:
[-1 0 1]
[-2 0 2]
[-1 0 1]
Tampering with the coefficients can lead to other gradient operators such as the Scharr operator, which gives just a little more weight to the center row:
[-3 0 3 ]
[-10 0 10]
[-3 0 3 ]
There are also mathematical reasons to this, such as the separability of these filters... but I prefer seeing it as an experimental discovery which proved to have interesting mathematical properties, as experiment is in my opinion at the heart of computer vision.
Only your imagination is the limit to create new ones, as long as it fits your needs...
EDIT The true reason that the Sobel operator looks that way can be be
found by reading an interesting article by Sobel himself. My
quick reading of this article indicates Sobel's idea was to get an
improved estimate of the gradient by averaging the horizontal,
vertical and diagonal central differences. Now when you break the
gradient into vertical and horizontal components, the diagonal central
differences are included in both, while the vertical and horizontal
central differences are only included in one. Two avoid double
counting the diagonals should therefore have half the weights of the
vertical and horizontal. The actual weights of 1 and 2 are just
convenient for fixed point arithmetic (and actually include a scale
factor of 16).
I agree with #mbrenon mostly, but there are a couple points too hard to make in a comment.
Firstly in computer vision, the "Most often, we just try an operator" approach just wastes time and gives poor results compared to what might have been achieved. (That said, I like to experiment too.)
It is true that a good reason to use [-1 0 1] is that it centres the derivative estimate at the pixel. But another good reason is that it is the central difference formula, and you can prove mathematically that it gives a lower error in its estmate of the true derivate than [-1 1].
[1 2 1] is used to filter noise as mbrenon, said. The reason these particular numbers work well is that they are an approximation of a Gaussian which is the only filter that does not introduce artifacts (although from Sobel's article, this seems to be coincidence). Now if you want to reduce noise and you are finding a horizontal derivative you want to filter in the vertical direction so as to least affect the derivate estimate. Convolving transpose([1 2 1]) with [-1 0 1] we get the Sobel operator. i.e.:
[1] [-1 0 1]
[2]*[-1 0 1] = [-2 0 2]
[1] [-1 0 1]
For 2D image you need a mask. Say this mask is:
[ a11 a12 a13;
a21 a22 a23;
a31 a32 a33 ]
Df_x (gradient along x) should be produced from Df_y (gradient along y) by a rotation of 90o, i.e. the mask should be:
[ a11 a12 a11;
a21 a22 a21;
a31 a32 a31 ]
Now if we want to subtract the signal in front of the middle pixel (thats what differentiation is in discrete - subtraction) we want to allocate same weights to both sides of subtraction, i.e. our mask becomes:
[ a11 a12 a11;
a21 a22 a21;
-a11 -a12 -a11 ]
Next, the sum of the weight should be zero, because when we have a smooth image (e.g. all 255s) we want to have a zero response, i.e. we get:
[ a11 a12 a11;
a21 -2a21 a21;
-a31 -a12 -a31 ]
In case of a smooth image we expect the differentiation along X-axis to produce zero, i.e.:
[ a11 a12 a11;
0 0 0;
-a31 -a12 -a31 ]
Finally if we normalize we get:
[ 1 A 1;
0 0 0;
-1 -A -1 ]
and you can set A to anything you want experimentally. A factor of 2 gives the original Sobel filter.