Erosion/Dilation for binary and grayscale images - image-processing

I am trying to work out the difference between Erosion and Dilation for binary and grayscale images.
As far as I know, this is erosion/dilation for binary images...
Erosion: If every pixel corresponding to an SE index that has 1 is a 1, output a 1. Otherwise 0.
Dilation: If at least one pixel corresponding to an SE index that has 1 is a 1, output a 1. Otherwise 0.
My question is, how does this work for 16-bit (0, 65535) grayscale images?

So what we have to do is to create an structual Element, that could be for example:
The formula says for dilation says:
image http://utam.gg.utah.edu/tomo03/03_mid/HTML/img642.png
and for erosion:
image http://utam.gg.utah.edu/tomo03/03_mid/HTML/img643.png
that means with have to take the maximum or minumum of each kernel values in the image and add 10 to it. If we have for example:
it goes to using dilation:
How you can see you just look at pixel position x,y take the center and add 10 to it. Then you check the neighbors if the computed value is the maximum. If it is a new maximum the pixel value get replaced, when not the pixel value stays. Hope it is clear for erosion you just take the minimum.

Related

How the Input Image Pixel Values Scaled in YOLO-V4?

I have an issue with something vague for me regarding the input data preprocessing in YOLO-V4,
If the input image is a grayscale image of 16-bits per pixel, i.e. range of pixel values [0,2^16) instead of [0,2^8), it is mentioned that they are scaled so I want to ask about:
1- These values are scaled to be within -1 to 1 or 0 to 1?
2- Which method is used in scaling (where can I find its piece of code)?
3- Is the scaling done using the max value per image or the max depth per image, i.e in my case does the max used in the scaling will be 2^16 or the maximum value in the image for example 2000 or whatever?
Thanks in advance,
(1) - from 0 to 1.
Now scaling works as v = pix / 255.
Here

How to downsample input label in binary high resolution segmentation?

I am going to solve a binary high resolution segmentation problem. Positive pixels are marked as same value while negative pixels are all zero. The input image is scaled to 1/4 by bi-cubic interpolation.
After scaling, the pixel values of positive labels are not all the same. So how to process these label images to make it still a binary segmentation problem? Just set the pixels which are larger than 0 to positive or set the pixels which larger than a threshold to positive?
If the answer is the latter one, how to set the threshold?
I suggest you do not use built-in resize functions, such as zoom or imresize. Suppose you have a binary mask of size 225 * 225, then the central point is (113, 113), start from this central point, sub-sample the points in all four directions with equal steps,(like 4). And finally you will find you have 4 different sample ways, average them.

Calculate the perceived brightness of an image

I wanna calculate the perceived brightness of an image and classify the image into dark, neutral and bright. And I find one problem here!
And I quote Lakshmi Narayanan's comment below. I'm confused with this method. What does "the average of the hist values from 0th channel" mean here? the 0th channel refer to gray image or value channel in hsv image? Moreover, what's the theory of that method?
Well, for such a case, I think the hsv would be better. Or try this method #2vision2. Compute the laplacian of the gray scale of the image. obtain the max value using minMacLoc. call it maxval. Estimate your sharpness/brightness index as - (maxval * average V channel values) / (average of the hist values from 0th channel), as said above. This would give you certain values. low bright images are usually below 30. 30 - 50 can b taken as ok images. and above 50 as bright images.
If you have an RGB color image you can get the brightness by converting it to another color space that separates color from intensity information like HSV or LAB.
Gray images already show local "brightness" so no conversion is necessary.
If an image is perceived as bright depends on many things. Mainly your display device, reference images, contrast, human...
Using a few intensity statistics values should give you an ok classification for one particular display device.

How to transform filter when using FFT to do 2d convolution?

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?
I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)
it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Resources