What does closest pixels mean in a digital image? - image-processing

I need to apply some simple filter to a digital image. It says that for each pixel, I need to get the median of the closest pixels. I wonder since the image for example is M x M. What are the closest pixels? Are they just left, right, upper, lower pixel, and the current pixel (in total 5 pixels) or I need to take into account all the 9 pixels in a 3x3 area?
Follow up question: what if I want the median of the N closest pixels (N = 3)?
Thanks.

I am guessing you are trying to apply median filter to a sample image. By definition of median for an image, you need to look at the neighboring pixels and find the median. There are two definitions which is important, one is the image size which is mn and the other filter kernel size which xy. If the kernel size is of size 3*3, you will need to look at 9 pixels like this:
Find the median of a odd number of pixels is easy, consider you have three 3 pixels x1, x2 and x3 arranged in ascending order of their values. The median of this set of pixels is x2.
Now, if you have an even number of pixels, usually the average of two pixels lying midway is computed. For example, say there are 4 pixels x1, x2, x3 and x4 arranged in ascending order of their values. The median of this set of pixels is (x1+x2)/2.

Related

Relationship of standard deviation for Gaussian filter between pixel domain and the real world

I constructed an experiment with Gaussian blur in real world and MR images. I printed some test images blurred and compare augmented images blurred too.
What is the best way to express how much blurring I applied in real-world coordinates?
The image is 2560x1440 pixels, corresponding to 533x300 cm in the real world. If this image is blurred with a Gaussian with standard deviation n (filter size is ceil(3 * n) * 2 + 1), how can this be expressed in centimeters? Is it reasonable to express it as the real size of the filter in centimeters?
In short, yes, it is perfectly reasonable to express the size of the kernel in real-world coordinates.
In your case, you have 533 cm == 2560 pixels horizontally, which is 0.2082 cm per pixel. (Please edit if the question has a mistake and this should be mm instead of cm.) Vertically you have approximately the same, so we can assume isotropic sampling and leave it at 0.208 cm/px.
Given that pixel size, a standard deviation of the Gaussian of n is equivalent to a standard deviation of 0.208*n cm in the real world.

Take pixels from input image (subsampling of image pixels)

How to take pixels from an input image by using Gaussian sub-sampling (shotgun pattern like)?
I want to take the locations of pixels that are to be taken like in a shotgun pattern concentrated in the middle of the image. Because I do not want to extract features of all pixels in an image. The output should be the coordinates of sampled pixels. I will be thankful if you guide me.
Is there any function or code that I can get help from that.
Your help is appreciated.
If you are looking for a method to define a Region of Interest (ROI) of an image in Matlab in order to perform some operation in a restricted are, remembering that x coordinates represent column and y is on the rows (matlab reads images as matrices):
For cut an image from x1 to x2 and from y1 to y2 try something like
ROI = image[y1:y2,x1:x2]
but how to determine these 4 values without a specific example is up to you

How the spatial extent of a superpixel is a region of size S*S?

I am reading this paper Achanta-SLIC Superpixel segmentation where it says that the every superpixel cluster center is located at a distance of S = root(N/k) and that expected spatial extent of a superpixel is a region of S * S and the search for similar pixels is done in a spatial region of 2S*2S.
Can someone please explain me this point as I am stuck at it?
From the paper:
Our algorithm takes as input a desired number of approximately equally-sized
superpixels K.
So, let's assume that our SP are approximately squares. You will have K of them.
For an image with N pixels, the approximate size of each superpixel
is therefore N/K pixels
If you divide the image area N in K SP, every SP has (almost) N/K pixels. I.e., the area of each SP is N/K.
For roughly equally sized superpixels there would be a superpixel center at every grid interval S = sqrt(N/K).
Each SP is assumed to be squared, with area N/K. The side of the square will then be sqrt(area) = sqrt(N/K) = S. This means that a SP center is S far from neighbours's centers.
Since the spatial extent of any superpixel is approximately S^2 (the approximate area of a superpixel)
Well, the side of each square is S, then its area is S^2 (which is the same as N/K = sqrt(N/K)^2 = S^2).
we can safely assume that pixels that are associated with this cluster
center lie within a 2S × 2S area around the superpixel center
We mentioned that each side of the square will be S, then each pixels of the SP will lie within the size of half the diagonal from the center sqrt(S/2), which is less than the side sqrt(S/2) < S. But SP are not exactly squares, so we want to be a little more flexible, and say that all pixels lie within the double of this distance: 2S.

How to transform filter when using FFT to do 2d convolution?

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?
I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)
it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

Resources