GLCM considers the relation between two pixels at a time, called the reference and the neighbour pixel. Based on the selection of the neighbour pixel generally 4 different Gray Level Co-occurrence Matrices (GLCM) can be calculated for an image.
Selection of the neighbour pixel is as follows.
reference pixel | neighbour pixel
(x,y) (x+1, y) the pixel to its right
(x,y) (x+1, y+1) the pixel to its right and above
(x,y) (x, y+1) the pixel above
(x,y) (x-1, y+1) the pixel to its left and above
A good, in-detail explanation about GLCM is available here (Original link).
My question is, is it required to consider all 3 intensity values of an image pixel when calculating Gray Level Co-occurrence Matrices (GLCM) of a "gray scale image"?
As an example consider an image with 2 pixels
------------------------------------------------------------------------------------
| [pixel1] | [pixel2] |
| / | \ | / | \ |
| [intensity1] [intensity2] [intensity3] | [intensity4] [intensity5] [intensity6] |
------------------------------------------------------------------------------------
When calculating the GLCM of a gray scale image is it required to take into account all 3 intensity values of a pixel?
E.g- When the reference pixel is (x,y) and its neighbour pixel is (x+1, y) the pixel to its right
Is it required to take into account the occurrences of intensity levels individually as follows?
[intensity1] & [intensity2]
[intensity2] & [intensity3]
[intensity3] & [intensity4]
[intensity4] & [intensity5]
[intensity5] & [intensity6]
Or can I just take into account one intensity value from each pixel, assuming all 3 intensity values of a pixel is same as follows?
[intensity1] & [intensity4]
Which is the correct method? Is it applicable for all 4 neighbours?
Related
The binning process, which is part of the point feature histogram estimation, results in b^3 bins if only the three angular features (alpha, phi, theta) are used, where b is the number of bins.
Why is it b^3 and not b * 3?
Let's say we consider alpha.
The feature value range is subdivided into b intervals. You iterate over all neighbors of the query point and count the amount of alpha values which lie in one interval. So you have b bins for alpha. When you repeat this for the other two features, you get 3 * b bins.
Where am I wrong?
For simplicity, I'll first explain it in 2D, i.e. with two angular features. In that case, you would have b^2 bins, not b*2.
The feature space is divided into a regular grid. Features are binned according to their position in the 2D (or 3D) space, not independently along each dimension. See the following example with two feature dimensions and b=4, where the feature is binned into the cell marked with #:
^ phi
|
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | |#|
+-+-+-+-+
| | | | |
+-+-+-+-+-> alpha
The feature is binned into the cell where alpha is in a given interval AND phi in another interval. The key difference to your understanding is that the dimensions are not treated independently. Each cell specifies an interval on all the dimensions, not a single one.
(This would work the same way in 3D, only that you would have another dimension for theta and a 3D grid instead of a 2D one.)
This way of binning results in b^2 bins for the 2D case, since each interval in the alpha dimension is combined with ALL intervals in the phi dimension, resulting in a squaring of the number, not a doubling. Add another dimension, and you get the cubing instead of the tripling, as in your question.
We're having some visual artifacts on a normal map for a shader because of some bands of single pixels which are very contrast to their surroundings. Just to be clear, edges are not an issue, only these single pixel bands.
Using something like typical Sobel edge detection would not work in this case because on top of such a band, it would detect 0. I can think of other modifications to the kernel which might works such as
-1 -2 -1
2 4 2
-1 -2 -1
but I assumed that there was likely a "correct" mathematical way to do such an operation.
In the end, I want to smooth these lines out using the surrounding pixels (so a selective blur). These lines could appear in any orientation, so if I were to use the above kernel, I would need to apply it in both direction and add it to get the line intensity similar to when applying the Sobel kernel.
I assume that you have lines of 1 pixel width in your image that are brighter or darker than their surroundings and you want to find them and remove them from the image and replace the removed pixels by an average of the local neighborhood.
I developed an algorithm for this and it works on my example data (since you did not give any data). It has two parts:
Identification of lines
I could not think of a simple, yet effective filter to detect lines (which are connected, so one would probably need to look at correlations). So I used a simple single pixel detection filter:
-1 -1 -1
-1 8 -1
-1 -1 -1
and then some suitable thresholding.
Extrapolation of data from outside of a mask to the mask
A very elegant solution (using only convolutions) is to take the data outside the mask and convolve it with a gaussian, then take negative mask and convolve it with the very same gaussian, then divide both pixelwise. The result within the mask is the desired blurring.
What it is mathematically: a weighted averaging of the data.
Here is my phantom data:
And this is the identification of the lines
And the final result shows that the distortion has been suppressed tenfold:
And finally my code (in Matlab):
%% create phantom data with lines (1pixel wide bands)
[x, y] = ndgrid(1:100, 1:100);
original = 3 * x - 2 * y + 100 * sin(x / 2) + 120 * cos(y / 3); % funny shapes
bw = original > mean(original(:)); % black and white
distortion = bwmorph(bw,'remove'); % some lines
data = original + max(original(:)) * distortion; % phantom
% show
figure();
subplot(1,3,1); imagesc(original); axis image; colormap(hot); title('original');
subplot(1,3,2); imagesc(distortion); axis image; title('distortion');
subplot(1,3,3); imagesc(data); axis image; title('image');
%% line detection
% filter by single pixel filter
pixel_filtered = filter2([-1,-1,-1;-1,8,-1;-1,-1,-1], data);
% create mask by simple thresholding
mask = pixel_filtered > 0.2 * max(pixel_filtered(:));
% show
figure();
subplot(1,2,1); imagesc(pixel_filtered); axis image; colormap(hot); title('filtered');
subplot(1,2,2); imagesc(mask); axis image; title('mask');
%% line removal and interpolation
% smoothing kernel: gaussian
smooth_kernel = fspecial('gaussian', [3, 3], 1);
smooth_kernel = smooth_kernel ./ sum(smooth_kernel(:)); % normalize to one
% smooth image outside mask and divide by smoothed negative mask
smoothed = filter2(smooth_kernel, data .* ~mask) ./ filter2(smooth_kernel, ~mask);
% withing mask set data to smoothed
reconstruction = data .* ~mask + smoothed .* mask;
% show
figure();
subplot(1,3,1); imagesc(reconstruction); axis image; colormap(hot); title('reconstruction');
subplot(1,3,2); imagesc(original); axis image; title('original');
subplot(1,3,3); imagesc(reconstruction - original); axis image; title('difference');
I've already computed the Fundamental Matrix of a stereo pair through corresponding points, found using SURF. According to Hartley and Zisserman, the Essential Matrix is computed doing:
E = K.t() * F * K
How I get K? Is there another way to compute E?
I don't know where you got that formulae, but the correct one is
E = K'^T . F . K (see Hartley & Zisserman, ยง9.6, page 257 of second edition)
K is the intrinsic camera parameters, holding scale factors and positions of the center of the image, expressed in pixel units.
| \alpha_u 0 u_0 |
K = | 0 \alpha_u v_0 |
| 0 0 1 |
(sorry, Latex not supported on SO)
Edit : To get those values, you can either:
calibrate the camera
compute an approximate value if you have the manufacturer data. If the lens is correctly centered on the sensor, then u_0 and v_0 are the half of, respectively, width and height of image resolution. And alpha = k.f with f: focal length (m.), and k the pixel scale factor: if you have a pixel of, say, 6 um, then k=1/6um.
Example, if the lens is 8mm and pixel size 8um, then alpha=1000
Computing E
Sure, there are several of ways to compute E, for example, if you have strong-calibrated the rig of cameras, then you can extract R and t (rotation matrix and translation vector) between the two cameras, and E is defined as the product of the skew-symmetric matrix t and the matrix R.
But if you have the book, all of this is inside.
Edit Just noticed, there is even a Wikipedia page on this topic!
I need to apply some simple filter to a digital image. It says that for each pixel, I need to get the median of the closest pixels. I wonder since the image for example is M x M. What are the closest pixels? Are they just left, right, upper, lower pixel, and the current pixel (in total 5 pixels) or I need to take into account all the 9 pixels in a 3x3 area?
Follow up question: what if I want the median of the N closest pixels (N = 3)?
Thanks.
I am guessing you are trying to apply median filter to a sample image. By definition of median for an image, you need to look at the neighboring pixels and find the median. There are two definitions which is important, one is the image size which is mn and the other filter kernel size which xy. If the kernel size is of size 3*3, you will need to look at 9 pixels like this:
Find the median of a odd number of pixels is easy, consider you have three 3 pixels x1, x2 and x3 arranged in ascending order of their values. The median of this set of pixels is x2.
Now, if you have an even number of pixels, usually the average of two pixels lying midway is computed. For example, say there are 4 pixels x1, x2, x3 and x4 arranged in ascending order of their values. The median of this set of pixels is (x1+x2)/2.
I'm trying to evaluate the complexity of some basic image filtering algorithms. I was wondering if you could verify this theory;
For a basic pixel by pixel filter like Inverse the number of operations grows linearly with the size of the input (In pixels) and
Let S = Length of the side of the image
Let M = # pixels input
Inverse is of order O(M) or O(S^2).
A convolution filter on the other hand has a parameter R which determines the size of the neighborhood to convolve in establishing the next pixel value for each filter.
Let R = Radius of convolution filter
Convolution is of order O(M*((R+R*2)^2) = O(M*(4R^2) = O(MR^2)
Or should I let N = the size of the convolution filter (Neighbourhood) in pixels?
O(M*(N)) = O(MN)
Ultimately a convolution filter is linearly dependent on the product of the number of pixels and the number of pixels in the neighbourhood.
If you have any links to a paper where this has been documented it would be greatly appreciated.
Kind regards,
Gavin
O(MN) seems right if I understand that for each pixel in the image the convolution is the adjustment of pixel values in the neighbourhood N, regardless of N being square. N could be best-fit triangle ... but providing the pixels in the neighbourhood are adjusted for each pixel in the image then O(MN) makes more sense, because the dependency is in the pixels adjusted per pixel in the source image.
Interestingly, in a non-regular neighbourhood some pixels may be adjusted by the neighbourhood mask more than others, but O(MN) will still stand.
If the neighbourhood is central on a pixel P and then moved to the next P which was not in the neighbourhood (meaning each pixel is transformed once) then this doesn't stand.