I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.
Related
I am looking for a workflow that would clean (and possibly straighten) old and badly scanned images of musical scores (like the one below).
I've tried to use denoise, hough filters, imagemagick geometry filters, and I am struggling to identify the series of filters that remove the scanner noise/bias.
Just some quick ideas:
Remove grayscale noise: Do a low pass filter (darks), since the music is darker than a lot of the noise. Remaining noise is mostly vertical lines.
Rotate image: Sum grayscale values for each column of the image. You'll get a vector with the total pixel lightness in that column. Use gradient descent or search on the rotation of the image (within some bounds like +/-15deg rotation) to maximize the variance of that vector. Idea here is that the vertical noise lines indicate vertical alignment, and so we want the columns of the image to align with these noise lines (= maximized variance).
Remove vertical line noise: After rotation, take median value of each column. The greater the distance (squared difference) a pixel is from that median darkness, the more confident we are it is its true color (e.g. a pure white or black pixel when vertical noise was gray). Since noise is non-white, you could try blending this distance by the whiteness of the median for an alternative confidence metric. Ideally, I think here you'd train some 7x7x2 convolution filter (2 channels being pixel value and distance from median) to estimate true value of the pixel. That would be the most minimal machine learning approach, not using some full-fledged NN. However, given your lack of training data, we'll have to come up with our own heuristic for what the true pixel value is. You likely will need to play around with it, but here's what I think might work:
Set some threshold of confidence; above that threshold we take the value as is. Below the threshold, set to white (the binary expected pixel value for the entire page).
For all values below threshold, take the max confidence value within a +/-2 pixels L1 distance (e.g. 5x5 convolution) as that pixel's value. Seems like features are separated by at least 2 pixels, but for lower resolutions that window size may need to be adjusted. Since white pixels may end up being more confident overall, you could experiment with prioritizing darker pixels (increase their confidence somehow).
Clamp the image contrast and maybe run another low pass filter.
If a program displays a pixel at X,Y on a display with resolution A, can I precisely predict at what coordinates the same pixel will display at resolution B?
MORE INFORMATION
The 2 display resolutions are:
A-->1366 x 768
B-->1600 x 900
Dividing the max resolutions in each direction yields:
X-direction scaling factor = 1600/1366 = 1.171303075
Y-direction scaling factor = 900/768 = 1.171875
Say for example that the only red pixel on display A occurs at pixel (1,1). If I merely scale up using these factors, then on display B, that red pixel will be displayed at pixel (1.171303075, 1.171875). I'm not sure how to interpret that, as I'm used to thinking of pixels as integer values. It might help if I knew the exact geometry of pixel coordinates/placement on a screen. e.g., do pixel coordinates (1,1) mean that the center of the pixel is at (1,1)? Or a particular corner of the pixel is at (1,1)? I'm sure diagrams would assist in visualizing this--if anyone can post a link to helpful resources, I'd appreciate it. And finally, I may be approaching this all wrong.
Thanks in advance.
I think, your problem is related to the field of scaling/resampling images. Bitmap-, or raster images are digital photographs, so they are the most common form to represent natural images that are rich in detail. The term bitmap refers to how a given pattern (bits in a pixel) maps to a specific color. A bitmap images take the form of an array, where the value of each element, called a pixel picture element, correspond to the color of that region of the image.
Sampling
When measuring the value for a pixel, one takes the average color of an area around the location of the pixel. A simplistic model is sampling a square, and a more accurate measurement is to calculate a weighted Gaussian average. When perceiving a bitmap image the human eye should blend the pixel values together, recreating an illusion of the continuous image it represents.
Raster dimensions
The number of horizontal and vertical samples in the pixel grid is called raster dimensions, it is specified as width x height.
Resolution
Resolution is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch.
Scaling / Resampling
Image scaling is the name of the process when we need to create an image with different dimensions from what we have. A different name for scaling is resampling. When resampling algorithms try to reconstruct the original continuous image and create a new sample grid. There are two kind of scaling: up and down.
Scaling image down
The process of reducing the raster dimensions is called decimation, this can be done by averaging the values of source pixels contributing to each output pixel.
Scaling image up
When we increase the image size we actually want to create sample points between the original sample points in the original raster, this is done by interpolation the values in the sample grid, effectively guessing the values of the unknown pixels. This interpolation can be done by nearest-neighbor interpolation, bilinear interpolation, bicubic interpolation, etc. But the scaled up/down image must be also represented over discrete grid.
Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.
Anyone know what is the difference between gamma and exposure? And what is the difference between gamma correction and exposure adjustment in image processing?
Since you don't have an image processing background i would start with a basics
1) Every digital image has a dynamic range of gray levels.Now gray levels are nothing but values which ultimately corresponds to a color. Say Mono-chrome image(Black and white image) has only 2 gray levels i.e. 0 and 1 where 0 means black and 1 means white color. Here the dynamic range is [0-1]. In these images each pixel is stored as a single bit.
Similarly there is Gray-scale images have shades of gray in them. Here each pixel is stored as 8-bit so dynamic range is [0-255]. How? just apply the formula (2^n -1) where n is number of bits. i.e. (2^8 - 1) i.e. 256-1 = 255.
Similarly there are color-images which are 24-bit images.In general the dynamic range of gray levels in image is given by [0 - L-1] where L is number of gray levels.
2) Now once you have understood what is dynamic range lets understand Gamma correction.Gamma correction is nothing but a function that compress the dynamic range of images so that we can view the image more nicely or properly. But why do we need to compress dynamic range? A best day to day example is during day time when we cannot see the stars, the reason is because the intensity of sun is so large as compared to the intensity of stars that we cannot see the stars in day time.Similarly when dynamic range is high in an image then that of the display device we cannot see the image properly. Therefore we can use gamma correction to compress the dynamic range of image
3) Gamma correction can be written as g(x,y) = c * f(x,y) ^ # where # is symbol of gamma (since i don't know how to write gamma symbol here, i have used #) and f(x,y) is original image with high dynamic range, g(x,y) is modified image. C is a positive constant.
4) Exposure as said earlier in an answer its phenomena in camera. I don't know much about it as it is not covered in the syllabus of image processing which i am currently studying.
Gamma correction is a non-linear global function that compresses certain ranges in your image. It is mainly used in order to be more efficient from human vision point of view, in fixed point format. It is absent in raw files, but exists in JPEG. Each pixel undergoes the following transformation:
y = x^p
Exposure is a physical phenomenon in your camera. Exposure adjustment on the other hand is linear global function. It is used mainly in order to compensate for lack or excess of exposure in the camera:
y = a*x
Exposure is an indication of the total quantity of light that reaches the CCD of your camera (or the silver ions on film). It can be expressed as the number of photons that hit your image-recording elements.
Films and CCD are calibrated to expect a certain quantity of light (certain number of photons) in order to be able to create an "average" image.
The higher the "expected" quantity of light, the lower the ISO number of your film (or camera setting) => in order to obtain a normal image, a film (or camera setting) of 100 ISO needs more light than a film of 3200 ISO, hence the use of 3200 ISO films for night photography.
next step: the camera thing. When you want to make a picture (= have photons hit your CCD or film), you need to open the diaphragm of your camera. Depending on how much you open your diaphragm, the nature of your image will change (speaking from an artistic point of view here). If your diaphragm is wide open, most of the image which is not perfectly in focus will be blurred (e.g. as used in portrait photography). Conversely, if your diaphragm is only a little bit open during exposure, most of your image will be very sharp. This is used very often for landscape photography.
As your film (or CCD) expect a certain quantity of light with a given ISO value, it is obvious that a smaller diaphragm opening requires longer exposure times whereas a wide open diaphraghm requires a very short time.
Good books about this subject are the series "The Camera", "The Negative" and "The Print" by Ansel Adams.
Conclusion: exposure and gamma correction are different things.
- Exposure is a part of the parameters you need to control while creating your initial image through the use of a camera.
- Gamma correction is related to subsequent manipulation of your image file. I'm not sure if the notion of "gamma correction" is being used in the context of film.
Basically:
Gamma is a monitor thing.
Exposure is a camera thing.
When we look at a photo of a group of trees, we are able to identify that the photo is predominantly green and brown, or for a picture of the sea we are able to identify that it is mostly blue.
Does anyone know of an algorithm that can be used to detect the prominent color or colours in a photo?
I can envisage a 3D clustering algorithm in RGB space or something similar. I was wondering if someone knows of an existing technique.
Convert the image from RGB to a color space with brightness and saturation separated (HSL/HSV)
http://en.wikipedia.org/wiki/HSL_and_HSV
Then find the dominating values for the hue component of each pixel. Make a histogram for the hue values of each pixel and analyze in which angle region the peaks fall in. A large peak in the quadrant between 180 and 270 degrees means there is a large portion of blue in the image, for example.
There can be several difficulties in determining one dominant color. Pathological example: an image whose left half is blue and right half is red. Also, the hue will not deal very well with grayscales obviously. So a chessboard image with 50% white and 50% black will suffer from two problems: the hue is arbitrary for a black/white image, and there are two colors that are exactly 50% of the image.
It sounds like you want to start by computing an image histogram or color histogram of the image. The predominant color(s) will be related to the peak(s) in the histogram.
You might want to change the image from RGB to indexed, then you could use a regular histogram and detect the pics (Matlab does this with rgb2ind(), as you probably already know), and then the problem would be reduced to your regular "finding peaks in an array".
Then
n = hist(Y,nbins) bins the elements in vector Y into 10 equally spaced containers and returns the number of elements in each container as a row vector.
Those values in n will give you how many elements in each bin. Then it's just a matter of fiddling with the number of bins to make them wide enough, and with how many elements in each would make you count said bin as a predominant color, then taking the bins that contain those many elements, calculating the index that corresponds with their middle, and converting it to RGB again.
Whatever you're using for your processing probably has similar functions to those
Average all pixels in the image.
Remove all pixels that are farther away from the average color than standard deviation.
GOTO 1 with remaining pixels until arbitrarily few are left (1 or maybe 1%).
You might also want to pre-process the image, for example apply high-pass filter (removing only very low frequencies) to even out lighting in the photo — http://en.wikipedia.org/wiki/Checker_shadow_illusion