Cluster centers in k-means? - image-processing

Cluster centers in k-means? - image-processing

I have a 96x96 pixel grayscale facial images. i am trying to find the eye centers and lip corners. I applied one gabor filter (theta=pi/2, lamda=1.50) on the facial image and after convolving i get the filter output like this.
As you can see from the gabor output eyes and mouth corners are clearly distinguishable. i apply scikit kmeans clustering to group pixels together to 4 clusters (2 eyes and 2 lip corner)
data = output.reshape(-1,96*96)
estimator = KMeans(n_clusters=4)
estimator.fit(data)
centroids = np.asarray(estimator.cluster_centers_)
print 'Cluster centers', centroids.shape
print 'Labels', estimator.labels_, estimator.labels_.shape
Output
Input X,y: (100, 96, 96) (1783, 1)
Gabor Filters (1, 9, 9)
Final output X,y (100, 96, 96) (0,)
Shape estimator.cluster_centers_: (4, 9216)
Now comes the question: How do i plot the centroids x,y coordinates of the 4 cluster centers? Will i see the eye centers and mouth corners
Further information: I plot the estimator.cluster_centers_ and the output is like a code book. i see no coordinates of cluster centroids.
I am using the steps as described in this paper: http://jyxy.tju.edu.cn/Precision/MOEMS/doc/p36.pdf

I think there's some confusion here about the space in which you're doing your K-means clustering. In the code snippet that you included in your question, you're training up a KMeans model using the vectorized face images as data points. K-means clusters live in the same space as the data you give it, so (as you noticed) your cluster centroids will also be vectorized face images. Importantly, these face images have dimension 9216, not dimension 2 (i.e., x-y coordinates)!
To get 2-dimensional (x, y) coordinates as your K-means centroids, you need to run the algorithm using 2-dimensional input data. Just off the top of my head, it seems like you could apply a darkness threshold to your face images and assemble a clustering dataset of only the dark pixel locations. Then after you run K-means on this dataset, the centroids will hopefully be close to the pixel locations in your face images where there are the most dark pixels. These locations (assuming the face images in your training data were already somewhat registered) should be somewhat close to the eyes and mouth corners that you're hoping for.
This can be really confusing so I'll try to add an example. Let's say just for an example you have "face images" that are 3 pixels wide by 4 pixels tall. After thresholding the pixels in one of your images it might look like:
0 1 2 <-- x coordinates
0 0 0 0 ^ y coordinates
0 1 0 1 |
1 0 0 2 |
0 0 1 3 v
If you use this "image" directly in K-means, you're really running your K-means algorithm in a 12-dimensional space, and the image above would be vectorized as:
0 0 0 0 1 0 1 0 0 0 0 1
Then your K-means cluster centroids will also live in this same 12-dimensional space.
What I'm trying to suggest is that you could extract the (x, y) coordinates of the 1s in each image, and use those as the data for your K-means algorithm. So for the example image above, you'd get the following data points:
1 1
0 2
2 3
In this example, we've extracted 3 2-dimensional points from this one "image"; with more images you'd get more 2-dimensional points. After you run K-means with these 2-dimensional data points, you'll get cluster centroids that can also be interpreted as pixel locations in the original images. You could plot those centroid locations on top of your images and see where they correspond in the images.

Related

GLCM Texture analysis in Sentinel-1 SNAP toolbox outputs texture with min and max pixel values not between 0 and 1

I have implemented GLCM Texture analysis on the Sentinel-1 SAR imagery. The imagery is high resolution. The parameters for the GLCM texture analysis are:
Window size: 5x5
Quantizer: Probablistic Quantizer
Quantization: 64 bit
Angle: 0 degree
Displacement: 1
The output is 10 different texture images. However the range of pixel values is not between 0 and 1. The range for every texture is between different min and max values. I believe this should be between 0 and 1 as it is a probabilistic analysis with GLCM that is being calculated for every pixel.
Am I missing a step?

I guess you are getting 10 different images because for each image pixel you are performing the following operations:
Define a neighbourhood of 5×5 centered at the considered pixel.
Compute the GLCM corresponding to displacement=1 and angle=0 of that neighbourhood.
Extract 10 features from the local GLCM.
This results in a stack of 10 images, one image for each feature extracted from the local GLCMs.
The problem is that Haralick features are not normalized to 1. Consider for example the standard definition of entropy:
If you wish to obtain entropy value in the range [0, 1] you should divide the equation above by the maximum entropy (measured in bits), like this:
where is the number of different grey levels.
This paper explains how to normalize contrast, correlation, energy, entropy and homogeneity features extracted from GLCM so that they have range [0, 1].

Bhattacharya distance between R,G,B Y Cb Cr components of two images

I have 2 images taken from two different cameras and I have to associate an object in both images. I have separated RGB ycbcr components and calculated the histogram of each component separately from both images
Then I concatenated histograms of all components into one vector.
I have already normalized each histogram separately so that sum(h)=1;
But when I have concatenated all histograms sum of that vector= 6.
and
when I applied Bhattacharya distance on both vectors the result is in range 4 and 5.
I cannot understand the similarity results because as per my knowledge result of Bhattacharya distance is 0-1
Please help

the best Bhattacharya distance is 2;
it is Jeffreys-Matusita distance that measure of Battachaya distance
if you have 2 class and the Jeffreys-Matusita was near 2 its good for classification and if it war near 0 the classes are same

Homography estimation from consecutive video frame lines gives bad results

I am trying to build a program which detects offside situation in a football video sequence. In order to track better players and ball I need to estimate the homography between consecutive frames. I am doing this project in Matlab.
I am able to find enough corresponding lines between frames but it seems to me that the resulting homography isn't correct.
I start from the following situation, where I have these two processed images (1280x720 px) with corresponding lines:
image 1 and image 2.
Lines derive from the Hough transform and are of the form cross(P1, P2), where P(i) is [x y 1]', with 0 < x,y < 1 (devided by the image width and height). Lines are normalized too, devided by the third component).
Before lines normalization (just after cross product) I have:
Lines from frame 1 (one line per row).
[ -0.9986 -0.2992 0.6792
-0.9986 -0.4305 0.5686
-0.8000 -0.4500 0.3613
-0.9986 -0.1609 0.7890
-0.9986 -0.0344 0.9074
-0.2500 -0.2164 0.0546]
These are lines from frame 2:
[-0.9986 -0.2984 0.6760
-0.9986 -0.4313 0.5678
-0.7903 -0.4523 0.3587
-0.9986 -0.1609 0.7890
-0.9986 -0.0391 0.9066
-0.2486 -0.2148 0.0539]
After normalization for each mathching line (in this case all rows correspond) I create matrix A(j)
[-u 0 u*x -v 0 v*x -1 0 x];
[0 -u u*y 0 -v v*y 0 -1 y];
where line(j)_1 is [x y 1]' and line(j)_2 is [u v 1]'. Then I form the entire matrix A and calculate SVD
[~,~,V] = svd(A);. Rearranging the last column of V as a 3x3 matrix will give H as:
[0.4234 0.0024 -0.3962
-0.3750 -0.0030 0.3503
0.4622 0.0029 -0.4322]
This homography matrix works quite well for the parallel lines above and the vanishing point (intersection of those lines) but it does a terrible job elsewhere. For example one vanishing point is in unscaled coordinates (1194.2, -607.4), it is supposed to stay there and in fact will be mapped few pixel around (5~10px). But if I take a random point in (300, 300) will go to (1174.1, -582.7)!
I can't see if I did some big mistake or it is because the noise in the measurements. Can you help me?

Well, you computed a homography mapping lines to lines. If you want the corresponding pointwise homography you need to invert and transpose it. See, for example, Chapter 1.3.1 of Hartley and Zisserman's "Multiple View Geometry".

From the images you posted, it looks like that the lines you are considering are all parallel to each other in the scene. Then the problem is ill-posed because there are an infinite amount of homographies which explain the resulting correspondences. Try adding to your correspondences lines which have other directions.

edge detection- how it works?

I am new to image processing and had to do some edge detection. I understood that there are 2 types of detectors- Gaussian and Laplacian which look for maximas and zero crossings respectively. What I don't understand is how this is implemented by simply convolving the image with 2d kernels. I mean how does convolving equals finding maxima and zero crossing?

Laplacian zero crossing is a 2nd derivative operation, since the local maxima is equivalent with a zero crossing in a 2nd derivative. So it can be written as f_xx+f_yy. If we use a 1D vector to represent f_xx and f_yy, it is [-1 2 -1] (f(x+1,y)-2*f(x,y)+f(x-1,y)). Since the laplacian is f_xx + f_yy, it can be rephrased in a 2D kernal:
0 -1 0
-1 4 -1
0 -1 0
or if you consider the diagonal elements as well, it is:
-1 -1 -1
-1 8 -1
-1 -1 -1
On the other hand,Gaussian kernal as a low pass filter is used here for scaling. The scaling ratio is controlled by the sigma. This mainly enhances the edges with different widths. Basically the larger the sigma, the thicker edges are enhanced.
Combined Laplacian and Gaussian is mathematically equivalent with G_xx + G_yy where G is the Gaussian kernel. But usually people used Difference of Gaussian instead of Laplacian of Gaussian to reduce the computational cost.

Dilate/erode modify kernel option

I want to smooth the contour of binarized images and think that erode is the best way to do it. I know that normal way of work is use cvDilate(src, dst, 0, iter); where 0 is a 3x3 matrix.
Problem is 3x3 matrix makes a deep erode in my images. How can I do a erode with a 2x2 matrix or anything smaller than the default 3x3 matrix.

Here you have for your reference the results of using different kernels:
Saludos!

If your goals is to have a binarized image with smooth edges then, if you have the original, it is better to use something like a gaussian blur with cvSmooth() on that before performing the binarization.
That said, you are not restricted to 3x3 kernels. cvDilate() takes an IplConvKernel produced by CreateStructuringElementEx and you can make a structuring element with any (rectangular) shape with that function.
However, a structuring element works relative to an anchor point that must have integer coordinates, so if you use a 2x2 matrix the matrix can not be centered around the pixel. so in most cases it is best to use structuring elements with an odd number of rows and collumns.
What you could do is create a 3x3 structuring element where only the center value and the values directly above, below, left and to the right of that are 1 like such:
0 1 0
1 1 1
0 1 0
rather than the default:
1 1 1
1 1 1
1 1 1
The first kernel will make for some slightly smoother edges.

Here's a quick and dirty approach to tell you whether dilation/erosion will work for you:
Upsample your image.
Erode (dilate, open, close, whatever) with the smallest filter you can use (typically 3x3)
Downsample back to the original image size

With the C API, you can create a dedicated IplConvKernel object of any kind and size with the function CreateStructuringElementEx(). If you to use the C++ API (function dilate()), the structuring element used for dilation is any matrix (Mat) you want.

A kernel with all 1's is a low pass convolution filter. A dilation filter replaces each pixel in the 3X3 region with the darkest pixel in that 3x3 region. An erosion filter replaces each pixel in the 3X3 region with the lightest pixel in that 3x3 region. That is if your background is light and your foreground object is dark. If you flip your background and foreground, then you would also flip your dilation and erosion filter.
Also if you want to perform an 'open' operation, you perform an erosion followed by a dilation. Conversely a 'close' operation is a dilation followed by an erosion. Open tends to remove isolated clumps of pixels and close tends to fill in holes.

Errosion and dilation matrices should be odd order
-- a 2*2 matrix cannot be used
convolution matrices should be of the order 1*1 3*3 5*5 7*7 ... but ODD
try to apply close - Erode then dilate the image operation - use the cvMorpologyEx() function

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart