Convolution theory vs implementation - image-processing

I study convolution in image processing as it is a part of the curriculum, I understand the theory and the formula but I am confused about its implementation.
The formula is:
What I understand
The convolution kernel is flipped both horizontally and vertically then the values in the kernel are multiplied by the corresponding pixel values, the results are summed, divided by "row x column" to get the average, and then finally this result is the value of the pixel at the center of the kernel location.
Confusion in implementation
When I run the example convolution program from my course material and insert as input a 3x3 convolution kernel where:
1st row: (0, 1, 0)
2nd row: (0, 0, 0)
3rd row: (0, 0, 0)
The processed image is shifted down by one pixel, where I expected it to shift upwards by one pixel. This result indicates that no horizontal or vertical flipping is done before calculating (as if it is doing correlation).
I thought there might be a fault in the program so I looked around and found that Adobe Flex 3 and Gimp are doing this as well.
I don't understand, is there something that I missed to notice?
Appreciate any help or feedback.

I guess the programs you tried implement correlation instead of convolution.
I've tried your filter in Mathematica using the ImageFilter function, the result is shifted upwards as expected:
result:
I've also tried it in Octave (an open source Matlab clone):
imfilter([1,1,1,1,1;
2,2,2,2,2;
3,3,3,3,3;
4,4,4,4,4;
5,5,5,5,5],
[0,1,0;
0,0,0;
0,0,0],"conv")
("conv" means convolution - imfilter's default is correlation). Result:
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
0 0 0 0 0
Note that the last row is different. That's because different implementations use different padding (by default). Mathematica uses constant padding for ImageConvolve, no padding for ListConvolve. Octave's imfilter uses zero padding.
Also note that (as belisarius mentioned) the result of a convolution can be smaller, same size or larger than the source image. (I've read the terms "valid", "same size" and "full" convolution in the Matlab and IPPI documentation, but I'm not sure if that's standard terminology). The idea is that the summation can either be performed
only over the source image pixels where the kernel is completely inside the image. In that case, the result is smaller than the source image.
over every source pixel. In that case, the result has the same size as the source image. This requires padding at the borders
over every pixel where any part of the kernel is inside the source image. In that case, the result image is larger than the source image. This also requires padding at the borders.

Please note that:
Results in:
So, the "shifting" is not real, as the dimensions are affected.

Related

Mean Filter at first position (0,0)

Actually, I am in the middle work of adaptive thresholding using mean. I used 3x3 matrix so I calculate means value on that matrix and replace it into M(1,1) or middle position of the matrix. I got confused about how to do perform the process at first position f(0,0).
This is a little illustration, let's assume that I am using 3x3 Matrix (M) and image (f) first position f(0,0) = M(1,1) = 4. So, M(0,0) M(0,1) M(0,2) M(1,0) M(2,0) has no value.
-1 | -1 | -1 |
-1 | 4 | 3 |
-1 | 2 | 1 |
Which one is the correct process,
a) ( 4 + 3 + 2 + 1 ) / 4
b) ( 4 + 3 + 2 + 1) / 9
I asked this because I follow some tutorial adaptive mean thresholding, it shows a different result. So, I need to make sure that the process is correct. Thanks.
There is no "correct" way to solve this issue. There are many different solutions used in practice, they all have some downsides:
Averaging over only the known values (i.e. your suggested (4+3+2+1)/4). By averaging over fewer pixels, one obtains a result that is more sensitive to noise (i.e. the "amount of noise" left in the image after filtering is larger near the borders. Also, a bias is introduced, since the averaging happens over values to one side only.
Assuming 0 outside the image domain (i.e. your suggested (4+3+2+1)/9). Since we don't know what is outside the image, assuming 0 is as good as anything else, no? Well, no it is not. This leads to a filter result that has darker values around the edges.
Assuming a periodic image. Here one takes values from the opposite side of the image for the unknown values. This effectively happens when computing the convolution through the Fourier domain. But usually images are not periodic, with strong differences in intensities (or colors) at opposite sides of the image, leading to "bleeding" of the colors on the opposite of the image.
Extrapolation. Extending image data by extrapolation is a risky business. This basically comes down to predicting what would have been in those pixels had we imaged them. The safest bet is 0-order extrapolation (i.e. replicating the boundary pixel), though higher-order polygon fits are possible too. The downside is that the pixels at the image edge become more important than other pixels, they will be weighted more heavily in the averaging.
Mirroring. Here the image is reflected at the boundary (imagine placing a mirror at the edge of the image). The value at index -1 is taken to be the value at index 1; at index -2 that at index 2, etc. This has similar downsides as the extrapolation method.

structure of opencv's hog output

I'm extracting the HOG features of a grayscale image using OpenCV's HOG implementation. Assuming that my image matches the default window size, i.e. 128x64, I'm struggling to understand correctly how that feature vector is organised. This is what I know:
Every cell outputs a 9 elements histogram quantifying the orientations of the edges lying within that cell (8x8 pixels by default).
Each block contains 2x2 cells.
By default, an 8x8 block stride is used.
This results in a 7*15*9*4 = 3780 elements feature vector. 7 and 15 are the number of blocks that fit horizontally and vertically when a 50% block overlap is used. All great until here.
If we examine the features of the first block, i.e. the first 9*4 elements, how are they arranged? Do the first 9 bins correspond to the top left cell in the block? what about the next 9? and the next?
And which orientation angle does each of the 9 bins represent? Does bins[0] = 0, bins[1] = 20, bins[2] = 40, ... bins[8] = 160. Or is the order different, for instance going from -pi/2 to +pi/2?

How to transform filter when using FFT to do 2d convolution?

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?
I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)
it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Adjustable sharpen color matrix

I am using .NET AForge libraries to sharpen and image. The "Sharpen" filter uses the following matrix.
0 -1 0
-1 5 -1
0 -1 0
This in fact does sharpen the image, but I need to sharpen the image more aggressively and based on a numeric range, lets say 1-100.
Using AForge, how do I transform this matrix with numbers 1 through 100 where 1 is almost not noticeable and 100 is very noticeable.
Thanks in advance!
The one property of a filter like this that must be maintained is that all the values sum to 1. You can subtract 1 from the middle value, multiple by some constant, then add 1 back to the middle and it will be scaled properly. Play around with the range (100 is almost certainly too large) until you find something that works.
You might also try using a larger filter matrix, or one that has values in the corners as well.
I would also suggest looking at the GaussianSharpen class and adjusting the sigma value.

Resources