Difference between contrast stretching and histogram equalization - image-processing

I would like to know the difference between contrast stretching and histogram equalization.
I have tried both using OpenCV and observed the results, but I still have not understood the main differences between the two techniques. Insights would be of much needed help.

Lets Define Contrast first,
Contrast is a measure of the “range” of an image; i.e. how spread its intensities are. It has many formal definitions one famous is Michelson’s:
He says contrast = ( Imax - Imin )/( Imax + I min )
Contrast is strongly tied to an image’s overall visual quality.
Ideally, we’d like images to use the entire range of values available
to them.
Contrast Stretching and Histogram Equalisation have the same goal: making the images to use entire range of values available to them.
But they use different techniques.
Contrast Stretching works like mapping
it maps minimum intensity in the image to the minimum value in the range( 84 ==> 0 in the example above )
With the same way, it maps maximum intensity in the image to the maximum value in the range( 153 ==> 255 in the example above )
This is why Contrast Stretching is un-reliable, if there exist only two pixels have 0 and 255 intensity, it is totally useless.
However a better approach is Histogram Equalisation which uses probability distribution. You can learn the steps here

I came across the following points after some reading.
Contrast stretching is all about increasing the difference between the maximum intensity value in an image and the minimum one. All the rest of the intensity values are spread out between this range.
Histogram equalization is about modifying the intensity values of all the pixels in the image such that the histogram is "flattened" (in reality, the histogram can't be exactly flattened, there would be some peaks and some valleys, but that's a practical problem).
In contrast stretching, there exists a one-to-one relationship of the intensity values between the source image and the target image i.e., the original image can be restored from the contrast-stretched image.
However, once histogram equalization is performed, there is no way of getting back the original image.

In Histogram equalization, you want to flatten the histogram into a uniform distribution.
In contrast stretching, you manipulate the entire range of intensity values. Like what you do in Normalization.

Contrast stretching is a linear normalization that stretches an arbitrary interval of the intensities of an image and fits the interval to an another arbitrary interval (usually the target interval is the possible minimum and maximum of the image, like 0 and 255).
Histogram equalization is a nonlinear normalization that stretches the area of histogram with high abundance intensities and compresses the area with low abundance intensities.

I think that contrast stretching broadens the histogram of the image intensity levels, so the intensity around the range of input may be mapped to the full intensity range.
Histogram equalization, on the other hand, maps all of the pixels to the full range according to the cumulative distribution function or probability.

Contrast is the difference between maximum and minimum pixel intensity.
Both methods are used to enhance contrast, more precisely, adjusting image intensities to enhance contrast.
During histogram equalization the overall shape of the histogram
changes, whereas in contrast stretching the overall shape of
histogram remains same.

Related

Workflow to clean badly scanned sheet music

I am looking for a workflow that would clean (and possibly straighten) old and badly scanned images of musical scores (like the one below).
I've tried to use denoise, hough filters, imagemagick geometry filters, and I am struggling to identify the series of filters that remove the scanner noise/bias.
Just some quick ideas:
Remove grayscale noise: Do a low pass filter (darks), since the music is darker than a lot of the noise. Remaining noise is mostly vertical lines.
Rotate image: Sum grayscale values for each column of the image. You'll get a vector with the total pixel lightness in that column. Use gradient descent or search on the rotation of the image (within some bounds like +/-15deg rotation) to maximize the variance of that vector. Idea here is that the vertical noise lines indicate vertical alignment, and so we want the columns of the image to align with these noise lines (= maximized variance).
Remove vertical line noise: After rotation, take median value of each column. The greater the distance (squared difference) a pixel is from that median darkness, the more confident we are it is its true color (e.g. a pure white or black pixel when vertical noise was gray). Since noise is non-white, you could try blending this distance by the whiteness of the median for an alternative confidence metric. Ideally, I think here you'd train some 7x7x2 convolution filter (2 channels being pixel value and distance from median) to estimate true value of the pixel. That would be the most minimal machine learning approach, not using some full-fledged NN. However, given your lack of training data, we'll have to come up with our own heuristic for what the true pixel value is. You likely will need to play around with it, but here's what I think might work:
Set some threshold of confidence; above that threshold we take the value as is. Below the threshold, set to white (the binary expected pixel value for the entire page).
For all values below threshold, take the max confidence value within a +/-2 pixels L1 distance (e.g. 5x5 convolution) as that pixel's value. Seems like features are separated by at least 2 pixels, but for lower resolutions that window size may need to be adjusted. Since white pixels may end up being more confident overall, you could experiment with prioritizing darker pixels (increase their confidence somehow).
Clamp the image contrast and maybe run another low pass filter.

How bilinear interpolation works when down scaling?

I can clearly understand how bilinear interpolation works when up scaling the image, like fill the values while taking 4 nearest neighbours, but i can't understand how it works while down scaling the image. It would mean a lot to me if someone clarify for me.
Scaling an image requires mapping pixels from the input to pixels on the output. If those pixel coordinates don't map to an integer, interpolation is required to estimate what the pixel value would have been. The "Bi" part of bilinear means it's linear interpolation applied in two dimensions independently. If for example output pixel 2,3 needs to come from input coordinates 1.5,7.2 you would interpolate in the X direction by taking 0.5 of each of the pixels at 1.0 and 2.0, then interpolate in the Y direction by taking 0.8 of the pixel at 7.0 and 0.2 of the pixel at 8.0. Usually these operations are combined into a single set of equations, but they can be applied separately if needed.
Bilinear is a poor choice for downscaling because it leads to aliasing artifacts. This is when you attempt to create spatial frequencies that are beyond the Nyquist sampling limit, and high frequency detail turns into low frequency artifacts. You can minimize this by blurring the image before you downscale it. Or you can choose an interpolation algorithm that incorporates some low pass filtering.

Relation between imageJ blur radius and OpenCV Gaussian Blur sigma

I am trying to blur a ROI in an image using Gaussian filter and imageJ software.
I am getting the desired result with blur radius as 9 in imageJ.
Now I am trying to write the corresponding OpenCV C++ application to do same operations which I did with imageJ.
The Gaussian Blur signature in openCV is as below:
C++: void GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, int borderType=BORDER_DEFAULT )
What is the sigmaX and sigmaY corresponding to ImageJ blur radius of 9?
I tried many resources such as:
Blur Radius
but I am not getting the same results with OpenCV.
Could you please elaborate on how the results are "not the same" ?
The blur radius in ImageJ is defined as "'Radius' means the radius of decay to exp(-0.5) ~ 61%, i.e. the standard deviation sigma of the Gaussian" (coming from ImageJ documentation : https://imagej.nih.gov/ij/developer/api/ij/plugin/filter/GaussianBlur.html#GaussianBlur--)
I see no reason why it should not be implemented the same way in OpenCV.
However, I also observe these differences between ImageJ and OpenCV gaussian blur.
While for the moment I have no solution to make these absolutely the same, I managed to get them closer, and can see one potential difference and one difference for sure in implementation :
Kernel size (potential difference) :
Are you aware that kernel size and gaussian radius are two different things ? Kernel size is the size of the kernel applied to the image (3*3, 5*5 etc), but inside this kernel a gaussian with any radius can theroetically exist. However, kernel size is often chosed such that on the kernel borders, the gaussian function has decayed to about zero.
This being said, ImageJ automatically choses the kernel for you depending on the radius you chose, in order to fulfill the "gaussian decays to zero on borders" condition. The OpenCV function also does that if you set sigma to your desired radius and ksize as zero. The question is "do they both do it the same way ?".
ImageJ's implementation of this is trickier than you might think : "In ImageJ, the size of the kernel actually used depends on the accuracy
needed: With sigma=1, for 16-bit and float images the kernel is 9 pixels
wide (which gives 9x9 for a 2D image), but for 8-bit or RGB images is is
only 7 pixels wide because there is no need for a very high accuracy if
there are only 256 different values. For large values of sigma, the situation is more complex: For sigma >=8, the data are first downscaled, then the Gaussian Blur is applied, and interpolation is used for upscaling to the original number of data points. The downscaling and interpolation algorithms are specially designed for best accuracy.", etc etc (coming from the "ImageJ forum", I can't post the link since I don't have enough reputation, but just google this quote if you want the source)
I do not know if OpenCV does such operations or if it computes the kernel size differently, thus giving different results. (couldn't find it with Google).
Borders (difference for sure) : As you probably know, the gaussian filter goes over every pixel in the image and computes a new value for this pixel based on its neighbors. But what about the pixels close to the borders, where the gaussian kernel is wider than their distance from the image's border ? How do algorithms handle it ? By inspecting my images closer, I found that the main differences between the OCV implementation and the IJ one were on the border pixels.
Well it turns out ImageJ and OpenCV handle these pixels differently :
ImageJ gaussian, "Like all convolution operations in ImageJ, it assumes that out-of-image pixels have a value equal to the nearest edge pixel." (from same ImageJ doc than above).
However, OpenCV lets you chose other options, and the default one, called BORDER_DEFAULT in the OpenCV call, is BORDER_REFLECT_101 (http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_core/py_basic_ops/py_basic_ops.html) (at least I think it is, it is the default border for another method using borders, so I would think it is also the default border for the gaussian). BORDER_REFLECT_101 sort of "mirrors" the borders (gfedcb|abcdefgh, see link).
To get closer to ImageJ (aaaaaaaa|abcdefgh), use BORDER_DEFAULT=BORDER_REPLICATE. With this, I get closer results between the two implementations (though not exactly the same, I will keep investigating and edit my answer if I find more clues).
[Note : I am working in Python2.7 (not C++) and OpenCV 3, but I don't think it has an impact on this problem]

Simple way to check if an image bitmap is blur

I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Resources