Box function with 11 elements - image-processing

There is an image and it is blurred horizontally with an 11 element box filter. My question is: how can the box filter be 11 elements? And since it says horizontal direction here, could it be a kernel 11 element row vector?

Yes, it's just a mean filter kernel. Think of it as an 11 element kernel where each coefficient is equal to 1/11. So in effect it sets each output pixel to the average value of the input pixels in the range -5 to +5 in the X axis.

Related

whats the ranges of the red in HSV?

i want to detect red objects in an image.so i convert RGB img to HSV. so in order to know the range of red color i used color pallet on this site
https://alloyui.com/examples/color-picker/hsv
I found out that H(Hue) is falling between 0 to 10 as a lower limit and 340 to 359 as an upper limit. also i found out that the maximum value of S(Saturation) and V(value) is 100. but the problem is that i found some people say the ranges of red H: 0 to 10 as lower limit and 160 to 180 as uper limit.
https://solarianprogrammer.com/2015/05/08/detect-red-circles-image-using-opencv/
OpenCV better detection of red color?
also they said the maximum S and V is 255.This is color i got when i tried to find the upper limit of the red
There are different definitions of HSV, so the values your particular conversion function gives are the ones your should use. Measuring them is the best way to know for sure.
In principle H is an angle, so it goes from 0 to 360, with red centered around 0 (and understanding that 360==0). But some implementations will divide that by 2 to fit it in 8 bits. Others scale to a full 0-255 rage for the 8 bits.
The same is true for S and V. Sometimes they're values between 0 and 100, sometimes they go up to 255.
To measure, create an image where you have pure red pixels (RGB value 255,0,0), and convert. That will give your the center of red hue (H) and the max saturation (S). Then make an image that changes from orange to purple, these colors are near red. You should then see the range of H. Finally, make a pure white image (255,255,255). This will have maximum intensity (V).

structure of opencv's hog output

I'm extracting the HOG features of a grayscale image using OpenCV's HOG implementation. Assuming that my image matches the default window size, i.e. 128x64, I'm struggling to understand correctly how that feature vector is organised. This is what I know:
Every cell outputs a 9 elements histogram quantifying the orientations of the edges lying within that cell (8x8 pixels by default).
Each block contains 2x2 cells.
By default, an 8x8 block stride is used.
This results in a 7*15*9*4 = 3780 elements feature vector. 7 and 15 are the number of blocks that fit horizontally and vertically when a 50% block overlap is used. All great until here.
If we examine the features of the first block, i.e. the first 9*4 elements, how are they arranged? Do the first 9 bins correspond to the top left cell in the block? what about the next 9? and the next?
And which orientation angle does each of the 9 bins represent? Does bins[0] = 0, bins[1] = 20, bins[2] = 40, ... bins[8] = 160. Or is the order different, for instance going from -pi/2 to +pi/2?

How to transform filter when using FFT to do 2d convolution?

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?
I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)
it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

What does closest pixels mean in a digital image?

I need to apply some simple filter to a digital image. It says that for each pixel, I need to get the median of the closest pixels. I wonder since the image for example is M x M. What are the closest pixels? Are they just left, right, upper, lower pixel, and the current pixel (in total 5 pixels) or I need to take into account all the 9 pixels in a 3x3 area?
Follow up question: what if I want the median of the N closest pixels (N = 3)?
Thanks.
I am guessing you are trying to apply median filter to a sample image. By definition of median for an image, you need to look at the neighboring pixels and find the median. There are two definitions which is important, one is the image size which is mn and the other filter kernel size which xy. If the kernel size is of size 3*3, you will need to look at 9 pixels like this:
Find the median of a odd number of pixels is easy, consider you have three 3 pixels x1, x2 and x3 arranged in ascending order of their values. The median of this set of pixels is x2.
Now, if you have an even number of pixels, usually the average of two pixels lying midway is computed. For example, say there are 4 pixels x1, x2, x3 and x4 arranged in ascending order of their values. The median of this set of pixels is (x1+x2)/2.

Convolution theory vs implementation

I study convolution in image processing as it is a part of the curriculum, I understand the theory and the formula but I am confused about its implementation.
The formula is:
What I understand
The convolution kernel is flipped both horizontally and vertically then the values in the kernel are multiplied by the corresponding pixel values, the results are summed, divided by "row x column" to get the average, and then finally this result is the value of the pixel at the center of the kernel location.
Confusion in implementation
When I run the example convolution program from my course material and insert as input a 3x3 convolution kernel where:
1st row: (0, 1, 0)
2nd row: (0, 0, 0)
3rd row: (0, 0, 0)
The processed image is shifted down by one pixel, where I expected it to shift upwards by one pixel. This result indicates that no horizontal or vertical flipping is done before calculating (as if it is doing correlation).
I thought there might be a fault in the program so I looked around and found that Adobe Flex 3 and Gimp are doing this as well.
I don't understand, is there something that I missed to notice?
Appreciate any help or feedback.
I guess the programs you tried implement correlation instead of convolution.
I've tried your filter in Mathematica using the ImageFilter function, the result is shifted upwards as expected:
result:
I've also tried it in Octave (an open source Matlab clone):
imfilter([1,1,1,1,1;
2,2,2,2,2;
3,3,3,3,3;
4,4,4,4,4;
5,5,5,5,5],
[0,1,0;
0,0,0;
0,0,0],"conv")
("conv" means convolution - imfilter's default is correlation). Result:
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
0 0 0 0 0
Note that the last row is different. That's because different implementations use different padding (by default). Mathematica uses constant padding for ImageConvolve, no padding for ListConvolve. Octave's imfilter uses zero padding.
Also note that (as belisarius mentioned) the result of a convolution can be smaller, same size or larger than the source image. (I've read the terms "valid", "same size" and "full" convolution in the Matlab and IPPI documentation, but I'm not sure if that's standard terminology). The idea is that the summation can either be performed
only over the source image pixels where the kernel is completely inside the image. In that case, the result is smaller than the source image.
over every source pixel. In that case, the result has the same size as the source image. This requires padding at the borders
over every pixel where any part of the kernel is inside the source image. In that case, the result image is larger than the source image. This also requires padding at the borders.
Please note that:
Results in:
So, the "shifting" is not real, as the dimensions are affected.

Resources