What's the theory behind computing variance of an image? - opencv

I am trying to compute the blurriness of an image by using LaplacianFilter.
According to this article: https://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/ I have to compute the variance of the output image. The problem is I don't understand conceptually how do I compute variance of an image.
Every pixel has 4 values for every color channel, therefore I can compute the variance of every channel, but then I get 4 values, or even 16 by computing variance-covariance matrix, but according to the OpenCV example, they have only 1 number.
After computing that number, they just play with the threshold in order to make a binary decision, whether the image is blurry or not.
PS. by no means I am an expert on this topic, therefore my statements can make no sense. If so, please be nice to edit the question.

On sentence description:
The blured image's edge is smoothed, so the variance is small.
1. How variance is calculated.
The core function of the post is:
def variance_of_laplacian(image):
# compute the Laplacian of the image and then return the focus
# measure, which is simply the variance of the Laplacian
return cv2.Laplacian(image, cv2.CV_64F).var()
As Opencv-Python use numpy.ndarray to represent the image, then we have a look on the numpy.var:
Help on function var in module numpy.core.fromnumeric:
var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<class 'numpy._globals$
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution.
The variance is computed for the flattened array by default, otherwise over the specified axis.
2. Using for picture
This to say, the var is calculated on the flatten laplacian image, or the flatted 1-D array.
To calculate variance of array x, it is:
var = mean(abs(x - x.mean())**2)
For example:
>>> x = np.array([[1, 2], [3, 4]])
>>> x.var()
1.25
>>> np.mean(np.abs(x - x.mean())**2)
1.25
For the laplacian image, it is edged image. Make images using GaussianBlur with different r, then do laplacian filter on them, and calculate the vars:
The blured image's edge is smoothed, so the variance is little.

First thing first, if you see the tutorial you gave, they convert the image to a greyscale, thus it will have only 1 channel and 1 variance. You can do it for each channel and try to compute a more complicated formula with it, or just use the variance over all the numbers... However I think the author also converts it to greyscale since it is a nice way of fusing the information and in one of the papers that the author supplies actually says that
A well focused image is expected to have a high variation in grey
levels.
The author of the tutorial actually explains it in a simple way. First, think what the laplacian filter does. It will show the well define edges here is an example using the grid of pictures he had. (click on it to see better the details)
As you can see the blurry images barely have any edges, while the focused ones have a lot of responses. Now, what would happen if you calculate the variance. let's imagine the case where white is 255 and black is 0. If everything is black... then the variance is low (cases of the blurry ones), but if they have like half and half then the variance is high.
However, as the author already said, this threshold is dependent on the domain, if you take a picture of a sky even if it is focus it may have low variance, since it is quite similar and does not have very well define edges...
I hope this answer your doubts :)

Related

Calculate similarity of picture and its sketch

I'm trying to develop algorithm, which returns similarity score for two given black and white images: original one and its sketch, drawn by human:
All original images has the same style, but there is no any given limited set of them. Their content could be totally different.
I've tried few approaches, but none of them was successful yet:
OpenCV template matching
OpenCV matchTemplate is not able to calculate similarity score of images. It could only tells me count of matched pixels, and this value is usually quite low, because of not ideal proportions of human's sketch.
OpenCV feature matching
I've failed with this method, because I couldn't find good algorithms for extracting significant features from human's sketch. Algorithms from OpenCV's tutorials are good in extracting corners and blobs as features. But here, in sketches, we have a lot of strokes - each of them produces a lot of insignificant, junk features and leads to fuzzy results.
Neural Network Classification
Also I took a look at neural networks - they are good in image classification, but also they need train sets for each of classes, and this part is impossible, because we have an unlimited set of possible images.
Which methods and algorithms would you use for this kind of task?
METHOD 1
Cosine similarity gives a similarity score ranging between (0 - 1).
I first converted the images to gray scale and binarized them. I cropped the original image to half the size and excluded the text as shown below:
I then converted the image arrays to 1D arrays using flatten(). I used the following to compute cosine similarity:
from scipy import spatial
result = spatial.distance.cosine(im2, im1)
print result
The result I obtained was 0.999999988431, meaning the images are similar to each other by this score.
EDIT
METHOD 2
I had the time to check out another solution. I figured out that OpenCV's cv2.matchTemplate() function performs the same job.
I f you check out THIS DOCUMENTATION PAGE you will come across the different parameters used.
I used the cv2.TM_SQDIFF_NORMED parameter (which gives the normalized square difference between the two images).
res = cv2.matchTemplate(th1, th2, cv2.TM_SQDIFF_NORMED)
print 1 - res
For the given images I obtained a similarity score of: 0.89689457

Effect of variance (sigma) at gaussian smoothing

I know about Gaussian, varaince, image blurring and i think that i understood the concept of variance at Gaussian blur but still i am not 100% sure.
I just want to know the role of sigma or variance at Gaussian smoothing. I mean, what happens by increasing the value of sigma for the same window size..and why it happens?
It would be really helpful if somebody provide some nice literature about it. (I already tried few but couldn't find what i am looking for)
Major confusion:
Higher frequency-> details (e.g. noise),
Lower Frequency-> kind of overview of the image.
By increasing sigma, we are allowing some higher frequencies....so we should get more detailed with increasing frequency but the case is opposite, when we increase sigma, the image becomes more blurry.
I think it should be done in the following steps, first from the signal processing point of view:
Gaussian Filter is a low pass filter. Low pass filters as their names imply pass low frequencies - keeping low frequencies. So when we look at the image in the frequency domain the highest frequencies happen in the edges(places that there is a high change in intensity and each intensity value corresponds to a specific visible frequency).
The role of sigma in the Gaussian filter is to control the variation
around its mean value. So as the Sigma becomes larger the more variance allowed around mean and as the Sigma becomes smaller the less variance allowed around mean.
Filtering in the spatial domain is done through convolution. it simply
means that we apply a kernel on every pixel in the image. The law exists for kernels. Their sum has to be zero.
Now putting all together! When we apply a Gaussian filter to an image, we are doing a low pass filtering. But as you know this happen in the discrete domain(image pixels). So we have to quantize our Gaussian filter in order to make a Gaussian kernel. In the quantization step, as the Gaussian filter(GF) has a small sigma it has the steepest pick. So the more weights will be focused in the center and the less around it.
In the sense of natural image statistics! The scientists in this field of studies showed that our vision system is a kind of Gaussian filter in the responses to the images. see for example take a look at a broad scene! don't pay attention to a specific point! so you see a broad scene with lots things in it. but the details are not clear! Now see a specific point in that seen. you see more details that previously you didn't. This is the Sigma appear here. when you increase the sigma you are looking to the broad scene without paying attention to the details exits. and when you decrease the value you will get more details.
I think Wikipedia can help more than me, Low Pass Filters, Guassian Blur
Put simply, increasing the sigma terms will cast a broader net over the neighboring pixels and decrease the impact of the pixels nearest the pixel of interest, e.g. it makes a blurrier image.

How to crop the roi of the image

in my project I want to crop the ROI of an image. For this I create a map with the regions of interesst. Now I want to crop the area which has the most important pixels (black is not important, white is important).
Has someone an idea how to realize it? I think this is a maximazion problem
The red border in the image below is an example how I want to crop this image
If I understood your question correctly, you have computed a value at every point in the image. These values suggests the "importance"/"interestingness"/"saliency" of each point. The matrix/image containing these values is the "map" you are referring to. Your goal is to get the bounding box for regions of interests (ROI) with high "importance" score.
The way I think you can go about segmenting the ROIs is to apply Graph Cut based segmentation computing a "score" at each pixel using your importance map. The result of the segmentation is a binary mask that masks the "important" pixels. Next, run OpenCV's findcontours function on this binary mask to get the individual connected components. Then use OpenCV's boundingRect function on the contours returned by findContours(...) to get the bounding boxes.
The good thing about using a Graph Cut based segmentation algorithm in this way is that it will join up fragmented components i.e. the resulting binary mask will tend not to have pockets of small holes even if your "importance" map is noisy.
One Graph Cut based segmentation algorithm already implemented in OpenCV is the GrabCut algorithm. A quick hack would be to apply it on your "importance" map to get the binary mask I mentioned above. A more sophisticated approach would be to build the foreground and background (color perhaps?) model using your "importance" map and passing it as input to the function. More details on GrabCut in OpenCV can be found here: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html?highlight=grabcut#void grabCut(InputArray img, InputOutputArray mask, Rect rect, InputOutputArray bgdModel, InputOutputArray fgdModel, int iterCount, int mode)
If you would like greater flexibility, you can hack your own graphcut based segmentation algorithm using the following MRF library. This library allows you to specify your custom objective function in computing the graph cut: http://vision.middlebury.edu/MRF/code/
To use the MRF library, you will need to specify the "cost" at each point in your image indicating whether that point is "foreground" or "background". You can also think of this dichotomy as "important" or "not important" instead of "foreground" vs "background".
The MRF library's goal is to return you a label at each point such that total cost of assigning those labels is as small as possible. Hence, the game is to come up with a function to compute a small cost for points you consider important and large otherwise.
Specifically, the cost at each point is composed of 2 parts: 1) The data term/function and 2) The smoothness term/function. As mentioned earlier, the smaller the data term at each point, the more likely that point will be selected. If your "importance" score s_ij is in the range [0, 1], then a common way to compute your data term would be -log(s_ij).
The smoothness terms is a way to suggest whether 2 neighboring pixels p, q, should have the same label i.e. both "foreground", "background", or one "foreground" and the other "background". Similar to the data cost, you have to construct it such that the cost is small for neighbor pixels having similar "importance" score so that they will be assigned the same label. This term is responsible for "smoothing" the resulting mask so that you will not have pixels of low "importance" sprinkled within regions of high "importance" and vice versa. If there are such regions, OpenCV's findContours(...) function mentioned above will return contours for these regions, which can be filtered out perhaps by checking their size.
Details on functions to compute the cost can be found in the GrabCut paper: GrabCut
This blog post provides a bit more detail (and code) on creating your own graphcut segmentation algorithm in OpenCV: http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/
Another paper showing how to perform graph cut segmentation on grayscale images (your case), with better notations, and without the complicated image matting part (not implemented in OpenCV's version) in the GrabCut paper is this: Graph Cuts and Efficient N-D Image Segmentation
Hope this helps.

Determine if an image needs contrasting automatically in OpenCV

OpenCV has a handy cvEqualizeHist() function that works great on faded/low-contrast images.
However when an already high-contrast image is given, the result is a low-contrast one. I got the reason - the histogram being distributed evenly and stuff.
Question is - how do I get to know the difference between a low-contrast and a high-contrast image?
I'm operating on Grayscale images and setting their contrast properly so that thresholding them won't delete the text i'm supposed to extract (thats a different story).
Suggestions welcome - esp on how to find out if the majority of the pixels in the image are light gray (which means that the equalise hist is to be performed)
Please help!
EDIT: thanks everyone for many informative answers. But the standard deviation calculation was sufficient for my requirements and hence I'm taking that to be the answer to my query.
You can probably just use a simple statistical measure of the image to determine whether an image has sufficient contrast. The variance of the image would probably be a good starting point. If the variance is below a certain threshold (to be empirically determined) then you can consider it to be "low contrast".
If you're adjusting contrast just so you can threshold later on, you may be able to avoid the contrast adjustment step if you set your threshold adaptively using Ohtsu's method.
If you're still interested in finding out the image contrast, then read on.
While there are a number of different ways to calculate "contrast". Often, those metrics are applied locally as opposed to the entire image, to make the result more sensitive to image content:
Divide the image into adjacent non-overlaying neighborhoods.
Pick neighborhood sizes that are approximate to size of the features of your image (e.g. if your main feature is horizontal text, make neighborhoods tall enough to capture 2 lines of text, and just as wide).
Apply the metric to each neighborhood individually
Threshold the metric result to separate low and high variance blocks. This will prevent such things as large, blank areas of page skewing your contrast estimates.
From there, you can use a number of features to determine contrast:
The proportion of high metric blocks to low metric blocks
High metric block mean
Intensity distance between the high and low metric blocks (using means, modes, etc)
This may serve as a better indication of image contrast than global image variance alone. Here's why:
(stddev: 50.6)
(stddev: 7.9)
The two images are perfectly in contrast (the grey background is just there to make it obvious it's an image), but their standard deviations (and thus variance) are completely different.
Calculate cumulative histogram of image.
Make linear regression of cumulative histogram in the form y(x) = A*x + B.
Calculate RMSE of real_cumulative_frequency(x)-y(x).
If that RMSE is close to zero - image is already equalized. (That means that for equalized images cumulative histograms must be linear)
Idea is taken from here.
EDIT:
I've illustrated this approach in my blog (C example code included).
There is a support provided in skimage for this. skimage.exposure.is_low_contrast. reference
example :
>>> image = np.linspace(0, 0.04, 100)
>>> is_low_contrast(image)
True
>>> image[-1] = 1
>>> is_low_contrast(image)
True
>>> is_low_contrast(image, upper_percentile=100)
False

Gaussian blur and convolution kernels

I do not understand what a convolution kernel is and how I would apply a convolution matrix to pixels in an image (I am talking about doing a Gaussian Blur operation on an image).
Also could I get an explanation on how to create a kernel for a Gaussian Blur operation?
I am reading this article but I cannot seem to understand how things are done...
Thanks to anyone who takes time to explain this to me :),
ExtremeCoder
The basic idea is that the new pixels of the image are created by an weighted average of the pixels close to it (imagine drawing a circle around the pixel).
For each pixel in the image you are going to create a little square around the pixel. Lets say you take the 8 neighbors next to a pixel (including diagonals even though do not matter here), and we perform a weighted average to get the middle pixel.
In the Gaussian blur case it breaks down to two one dimensional operations. For each pixel take the some amount of pixels next to a pixel in the row direction only. Multiply the pixel values time the weights computed from the Gaussian distribution (or if you are doing this for an visual effect and not for a scientific reason, the weights can anything that looks good) and sum them up. Another way to look at it is the pixel make a vector and the weights make a vector and your are taking the dot product. Repeat this process in the column direction as a separate pass.
A convolution kernel is a matrix of values that specify how the neighborhood of a pixel contribute to that pixel's state in the final image. There's a fair description of the basics here. A gaussian blur is a convolution function that uses a really ugly (you've seen the wikipedia page) function to compute a convolution kernel to pass over the image. You'll find an example kernel for a gaussian in that wikipedia page.
The point of all the math in there is to produce a soft blur that resembles the scatter pattern produced by a mesh screen placed between the viewer and the image. You can think of the 'size' (the standard deviation) of the gaussian as being related to the distance between the image and the screen.
Here's an awesome tool, if you don't want to calculate it all by yourself (like me):
http://www.embege.com/gauss/
EDIT
Since the link seems to be broken now, here's a link to archive.org:
http://web.archive.org/web/20150217075657/http://www.embege.com/gauss

Resources