I am using .NET AForge libraries to sharpen and image. The "Sharpen" filter uses the following matrix.
0 -1 0
-1 5 -1
0 -1 0
This in fact does sharpen the image, but I need to sharpen the image more aggressively and based on a numeric range, lets say 1-100.
Using AForge, how do I transform this matrix with numbers 1 through 100 where 1 is almost not noticeable and 100 is very noticeable.
Thanks in advance!
The one property of a filter like this that must be maintained is that all the values sum to 1. You can subtract 1 from the middle value, multiple by some constant, then add 1 back to the middle and it will be scaled properly. Play around with the range (100 is almost certainly too large) until you find something that works.
You might also try using a larger filter matrix, or one that has values in the corners as well.
I would also suggest looking at the GaussianSharpen class and adjusting the sigma value.
Related
I have an issue with something vague for me regarding the input data preprocessing in YOLO-V4,
If the input image is a grayscale image of 16-bits per pixel, i.e. range of pixel values [0,2^16) instead of [0,2^8), it is mentioned that they are scaled so I want to ask about:
1- These values are scaled to be within -1 to 1 or 0 to 1?
2- Which method is used in scaling (where can I find its piece of code)?
3- Is the scaling done using the max value per image or the max depth per image, i.e in my case does the max used in the scaling will be 2^16 or the maximum value in the image for example 2000 or whatever?
Thanks in advance,
(1) - from 0 to 1.
Now scaling works as v = pix / 255.
Here
I have a 8-bit image and I want to filter it with a matrix for edge detection. My kernel matrix is
0 1 0
1 -4 1
0 1 0
For some indices it gives me a negative value. What am I supposed to with them?
Your kernel is a Laplace filter. Applying it to an image yields a finite difference approximation to the Laplacian operator. The Laplace operator is not an edge detector by itself.
But you can use it as a building block for an edge detector: you need to detect the zero crossings to find edges (this is the Marr-Hildreth edge detector). To find zero crossings, you need to have negative values.
You can also use the Laplace filtered image to sharpen your image. If you subtract it from the original image, the result will be an image with sharper edges and a much crisper feel. For this, negative values are important too.
For both these applications, clamping the result of the operation, as suggested in the other answer, is wrong. That clamping sets all negative values to 0. This means there are no more zero crossings to find, so you can't find edges, and for the sharpening it means that one side of each edge will not be sharpened.
So, the best thing to do with the result of the Laplace filter is preserve the values as they are. Use a signed 16-bit integer type to store your results (I actually prefer using floating-point types, it simplifies a lot of things).
On the other hand, if you want to display the result of the Laplace filter to a screen, you will have to do something sensical with the pixel values. Common in this case is to add 128 to each pixel. This shifts the zero to a mid-grey value, shows negative values as darker, and positive values as lighter. After adding 128, values above 255 and below 0 can be clipped. You can also further stretch the values if you want to avoid clipping, for example laplace / 2 + 128.
Out of range values are extremely common in JPEG. One handles them by clamping.
If X < 0 then X := 0 ;
If X > 255 then X := 255 ;
Actually, I am in the middle work of adaptive thresholding using mean. I used 3x3 matrix so I calculate means value on that matrix and replace it into M(1,1) or middle position of the matrix. I got confused about how to do perform the process at first position f(0,0).
This is a little illustration, let's assume that I am using 3x3 Matrix (M) and image (f) first position f(0,0) = M(1,1) = 4. So, M(0,0) M(0,1) M(0,2) M(1,0) M(2,0) has no value.
-1 | -1 | -1 |
-1 | 4 | 3 |
-1 | 2 | 1 |
Which one is the correct process,
a) ( 4 + 3 + 2 + 1 ) / 4
b) ( 4 + 3 + 2 + 1) / 9
I asked this because I follow some tutorial adaptive mean thresholding, it shows a different result. So, I need to make sure that the process is correct. Thanks.
There is no "correct" way to solve this issue. There are many different solutions used in practice, they all have some downsides:
Averaging over only the known values (i.e. your suggested (4+3+2+1)/4). By averaging over fewer pixels, one obtains a result that is more sensitive to noise (i.e. the "amount of noise" left in the image after filtering is larger near the borders. Also, a bias is introduced, since the averaging happens over values to one side only.
Assuming 0 outside the image domain (i.e. your suggested (4+3+2+1)/9). Since we don't know what is outside the image, assuming 0 is as good as anything else, no? Well, no it is not. This leads to a filter result that has darker values around the edges.
Assuming a periodic image. Here one takes values from the opposite side of the image for the unknown values. This effectively happens when computing the convolution through the Fourier domain. But usually images are not periodic, with strong differences in intensities (or colors) at opposite sides of the image, leading to "bleeding" of the colors on the opposite of the image.
Extrapolation. Extending image data by extrapolation is a risky business. This basically comes down to predicting what would have been in those pixels had we imaged them. The safest bet is 0-order extrapolation (i.e. replicating the boundary pixel), though higher-order polygon fits are possible too. The downside is that the pixels at the image edge become more important than other pixels, they will be weighted more heavily in the averaging.
Mirroring. Here the image is reflected at the boundary (imagine placing a mirror at the edge of the image). The value at index -1 is taken to be the value at index 1; at index -2 that at index 2, etc. This has similar downsides as the extrapolation method.
Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.
I study convolution in image processing as it is a part of the curriculum, I understand the theory and the formula but I am confused about its implementation.
The formula is:
What I understand
The convolution kernel is flipped both horizontally and vertically then the values in the kernel are multiplied by the corresponding pixel values, the results are summed, divided by "row x column" to get the average, and then finally this result is the value of the pixel at the center of the kernel location.
Confusion in implementation
When I run the example convolution program from my course material and insert as input a 3x3 convolution kernel where:
1st row: (0, 1, 0)
2nd row: (0, 0, 0)
3rd row: (0, 0, 0)
The processed image is shifted down by one pixel, where I expected it to shift upwards by one pixel. This result indicates that no horizontal or vertical flipping is done before calculating (as if it is doing correlation).
I thought there might be a fault in the program so I looked around and found that Adobe Flex 3 and Gimp are doing this as well.
I don't understand, is there something that I missed to notice?
Appreciate any help or feedback.
I guess the programs you tried implement correlation instead of convolution.
I've tried your filter in Mathematica using the ImageFilter function, the result is shifted upwards as expected:
result:
I've also tried it in Octave (an open source Matlab clone):
imfilter([1,1,1,1,1;
2,2,2,2,2;
3,3,3,3,3;
4,4,4,4,4;
5,5,5,5,5],
[0,1,0;
0,0,0;
0,0,0],"conv")
("conv" means convolution - imfilter's default is correlation). Result:
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
0 0 0 0 0
Note that the last row is different. That's because different implementations use different padding (by default). Mathematica uses constant padding for ImageConvolve, no padding for ListConvolve. Octave's imfilter uses zero padding.
Also note that (as belisarius mentioned) the result of a convolution can be smaller, same size or larger than the source image. (I've read the terms "valid", "same size" and "full" convolution in the Matlab and IPPI documentation, but I'm not sure if that's standard terminology). The idea is that the summation can either be performed
only over the source image pixels where the kernel is completely inside the image. In that case, the result is smaller than the source image.
over every source pixel. In that case, the result has the same size as the source image. This requires padding at the borders
over every pixel where any part of the kernel is inside the source image. In that case, the result image is larger than the source image. This also requires padding at the borders.
Please note that:
Results in:
So, the "shifting" is not real, as the dimensions are affected.