How can I threshold this blurry image to make the digits as clear as possible?
In a previous post, I tried adaptively thresholding a blurry image (left), which resulted in distorted and disconnected digits (right):
Since then, I've tried using a morphological closing operation as described in this post to make the brightness of the image uniform:
If I adaptively threshold this image, I don't get significantly better results. However, because the brightness is approximately uniform, I can now use an ordinary threshold:
This is a lot better than before, but I have two problems:
I had to manually choose the threshold value. Although the closing operation results in uniform brightness, the level of brightness might be different for other images.
Different parts of the image would do better with slight variations in the threshold level. For instance, the 9 and 7 in the top left come out partially faded and should have a lower threshold, while some of the 6s have fused into 8s and should have a higher threshold.
I thought that going back to an adaptive threshold, but with a very large block size (1/9th of the image) would solve both problems. Instead, I end up with a weird "halo effect" where the centre of the image is a lot brighter, but the edges are about the same as the normally-thresholded image:
Edit: remi suggested morphologically opening the thresholded image at the top right of this post. This doesn't work too well. Using elliptical kernels, only a 3x3 is small enough to avoid obliterating the image entirely, and even then there are significant breakages in the digits:
Edit2: mmgp suggested using a Wiener filter to remove blur. I adapted this code for Wiener filtering in OpenCV to OpenCV4Android, but it makes the image even blurrier! Here's the image before (left) and after filtering with my code and a 5x5 kernel:
Here is my adapted code, which filters in-place:
private void wiener(Mat input, int nRows, int nCols) { // I tried nRows=5 and nCols=5
Mat localMean = new Mat(input.rows(), input.cols(), input.type());
Mat temp = new Mat(input.rows(), input.cols(), input.type());
Mat temp2 = new Mat(input.rows(), input.cols(), input.type());
// Create the kernel for convolution: a constant matrix with nRows rows
// and nCols cols, normalized so that the sum of the pixels is 1.
Mat kernel = new Mat(nRows, nCols, CvType.CV_32F, new Scalar(1.0 / (double) (nRows * nCols)));
// Get the local mean of the input. localMean = convolution(input, kernel)
Imgproc.filter2D(input, localMean, -1, kernel, new Point(nCols/2, nRows/2), 0);
// Get the local variance of the input. localVariance = convolution(input^2, kernel) - localMean^2
Core.multiply(input, input, temp); // temp = input^2
Imgproc.filter2D(temp, temp, -1, kernel, new Point(nCols/2, nRows/2), 0); // temp = convolution(input^2, kernel)
Core.multiply(localMean, localMean, temp2); //temp2 = localMean^2
Core.subtract(temp, temp2, temp); // temp = localVariance = convolution(input^2, kernel) - localMean^2
// Estimate the noise as mean(localVariance)
Scalar noise = Core.mean(temp);
// Compute the result. result = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.max(temp, noise, temp2); // temp2 = max(localVariance, noise)
Core.subtract(temp, noise, temp); // temp = localVariance - noise
Core.max(temp, new Scalar(0), temp); // temp = max(0, localVariance - noise)
Core.divide(temp, temp2, temp); // temp = max(0, localVar-noise) / max(localVariance, noise)
Core.subtract(input, localMean, input); // input = input - localMean
Core.multiply(temp, input, input); // input = max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.add(input, localMean, input); // input = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
}
Some hints that you might try out:
Apply the morphological opening in your original thresholded image (the one which is noisy at the right of the first picture). You should get rid of most of the background noise and be able to reconnect the digits.
Use a different preprocessing of your original image instead of morpho closing, such as median filter (tends to blur the edges) or bilateral filtering which will preserve better the edges but is slower to compute.
As far as threshold is concerned, you can use CV_OTSU flag in the cv::threshold to determine an optimal value for a global threshold. Local thresholding might still be better, but should work better with the bilateral or median filter
I've tried thresholding each 3x3 box separately, using Otsu's algorithm (CV_OTSU - thanks remi!) to determine an optimal threshold value for each box. This works a bit better than thresholding the entire image, and is probably a bit more robust.
Better solutions are welcome, though.
If you're willing to spend some cycles on it there are de-blurring techniques that could be used to sharpen up the picture prior to processing. Nothing in OpenCV yet but if this is a make-or-break kind of thing you could add it.
There's a bunch of literature on the subject:
http://www.cse.cuhk.edu.hk/~leojia/projects/motion_deblurring/index.html
http://www.google.com/search?q=motion+deblurring
And some chatter on the OpenCV mailing list:
http://tech.groups.yahoo.com/group/OpenCV/message/20938
The weird "halo effect" that you're seeing is likely due to OpenCV assuming black for the color when the adaptive threshold is at/near the edge of the image and the window that it's using "hangs over" the edge into non-image territory. There are ways to correct for this, most likely you would make an temporary image that's at least two full block-sizes taller and wider than the image from the camera. Then copy the camera image into the middle of it. Then set the surrounding "blank" portion of the temp image to be the average color of the image from the camera. Now when you perform the adaptive threshold the data at/near the edges will be much closer to accurate. It won't be perfect since its not a real picture but it will yield better results than the black that OpenCV is assuming is there.
My proposal assumes you can identify the sudoku cells, which I think, is not asking too much. Trying to apply morphological operators (although I really like them) and/or binarization methods as a first step is the wrong way here, in my opinion of course. Your image is at least partially blurry, for whatever reason (original camera angle and/or movement, among other reasons). So what you need is to revert that, by performing a deconvolution. Of course asking for a perfect deconvolution is too much, but we can try some things.
One of these "things" is the Wiener filter, and in Matlab, for instance, the function is named deconvwnr. I noticed the blurry to be in the vertical direction, so we can perform a deconvolution with a vertical kernel of certain length (10 in the following example) and also assume the input is not noise free (assumption of 5%) -- I'm just trying to give a very superficial view here, take it easy. In Matlab, your problem is at least partially solved by doing:
f = imread('some_sudoku_cell.png');
g = deconvwnr(f, fspecial('motion', 10, 90), 0.05));
h = im2bw(g, graythresh(g)); % graythresh is the Otsu method
Here are the results from some of your cells (original, otsu, otsu of region growing, morphological enhanced image, otsu from morphological enhanced image with region growing, otsu of deconvolution):
The enhanced image was produced by performing original + tophat(original) - bottomhat(original) with a flat disk of radius 3. I manually picked the seed point for region growing and manually picked the best threshold.
For empty cells you get weird results (original and otsu of deconvlution):
But I don't think you would have trouble to detect whether a cell is empty or not (the global threshold already solves it).
EDIT:
Added the best results I could get with a different approach: region growing. I also attempted some other approaches, but this was the second best one.
Related
The image below has many circles. Click and zoom in to see the circles.
https://drive.google.com/open?id=1ox3kiRX5hf2tHDptWfgcbMTAHKCDizSI
What I want is counting the circles using any free language, such as python.
Is there a function or idea to do it?
Edit: I came up with a better solution, partially inspired by this answer below. I thought of this method originally (as noted in the OP comments) but I decided against it. The original image was just not good enough quality for it. However I improved that method and it works brilliantly for the better quality image. The original approach is first, and then the new approach at the bottom.
First approach
So here's a general approach that seems to work well, but definitely just gives estimates. This assumes that circles are roughly the same size.
First, the image is mostly blue---so it seems reasonable to just do the analysis on the blue channel. Thresholding the blue channel, in this case, using Otsu thresholding (which determines an optimal threshold value without input) seems to work very well. This isn't too much of a surprise since the distribution of color values is pretty much binary. Check the mask that results from it!
Then, do a connected component analysis on the mask to get the area of each component (component = white blob in the mask). The statistics returned from connectedComponentsWithStats() give (among other things) the area, which is exactly what we need. Then we can simply count the circles by estimating how many circles fit in a given component based on its area. Also note that I'm taking the statistics for every label except the first one: this is the background label 0, and not any of the white blobs.
Now, how large in area is a single circle? It would be best to let the data tell us. So you could compute a histogram of all the areas, and since there are more single circles than anything else, there will be a high concentration around 250-270 pixels or so for the area. Or you could just take an average of all the areas between something like 50 and 350 which should also get you in a similar ballpark.
Really in this histogram you can see the demarcations between single circles, double circles, triple, and so on quite easily. Only the larger components will give pretty rough estimates. And in fact, the area doesn't seem to scale exactly linearly. Blobs of two circles are slightly larger than two single circles, and blobs of three are larger still than three single circles, and so on, so this makes it a little difficult to estimate nicely, but rounding should still keep us close. If you want you could include a small multiplication parameter that increases as the area increases to account for that, but that would be hard to quantify without going through the histogram analytically...so, I didn't worry about this.
A single circle area divided by the average single circle area should be close to 1. And the area of a 5-circle group divided by the average circle area should be close to 5. And this also means that small insignificant components, that are 1 or 10 or even 100 pixels in area, will not count towards the total since round(50/avg_circle_size) < 1/2, so those will round down to a count of 0. Thus I should just be able to take all the component areas, divide them by the average circle size, round, and get to a decent estimate by summing them all up.
import cv2
import numpy as np
img = cv2.imread('circles.png')
mask = cv2.threshold(img[:, :, 0], 255, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
stats = cv2.connectedComponentsWithStats(mask, 8)[2]
label_area = stats[1:, cv2.CC_STAT_AREA]
min_area, max_area = 50, 350 # min/max for a single circle
singular_mask = (min_area < label_area) & (label_area <= max_area)
circle_area = np.mean(label_area[singular_mask])
n_circles = int(np.sum(np.round(label_area / circle_area)))
print('Total circles:', n_circles)
This code is simple and effective for rough counts.
However, there are definitely some assumptions here about the groups of circles compared to a normal circle size, and there are issues where circles that are at the boundaries will not be counted correctly (these aren't well defined---a two circle blob that is half cut off will look more like one circle---no clear way to count or not count these with this method). Further I just used automatic thresholding via Otsu here; you could get (probably better) results with more careful color filtering. Additionally in the mask generated by Otsu, some circles that are masked have a few pixels removed from their center. Morphology could add these pixels back in, which would give you a (slightly larger) more accurate area for the single circle components. Either way, I just wanted to give the general idea towards how you could easily estimate this with minimal code.
New approach
Before, the goal was to count circles. This new approach instead counts the centers of the circles. The general idea is you threshold and then flood fill from a background pixel to fill in the background (flood fill works like the paint bucket tool in photo editing apps), that way you only see the centers, as shown in this answer below.
However, this relies on global thresholding, which isn't robust to local lighting changes. This means that since some centers are brighter/darker than others, you won't always get good results with a single threshold.
Here I've created an animation to show looping through different threshold values; watch as some centers appear and disappear at different times, meaning you get different counts depending on the threshold you choose (this is just a small patch of the image, it happens everywhere):
Notice that the first blob to appear in the top left actually disappears as the threshold increases. However, if we actually OR each frame together, then each detected pixel persists:
But now every single speck appears, so we should clean up the mask each frame so that we remove single pixels as they come (otherwise they may build up and be hard to remove later). Simple morphological opening with a small kernel will remove them:
Applied over the whole image, this method works incredibly well and finds almost every single cell. There are only three false positives (detected blob that's not a center) and two misses I can spot, and the code is very simple. The final thing to do after the mask has been created is simply count the components, minus one for the background. The only user input required here is a single point to flood fill from that is in the background (seed_pt in the code).
img = cv2.imread('circles.png', 0)
seed_pt = (25, 25)
fill_color = 0
mask = np.zeros_like(img)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
for th in range(60, 120):
prev_mask = mask.copy()
mask = cv2.threshold(img, th, 255, cv2.THRESH_BINARY)[1]
mask = cv2.floodFill(mask, None, seed_pt, fill_color)[1]
mask = cv2.bitwise_or(mask, prev_mask)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
n_centers = cv2.connectedComponents(mask)[0] - 1
print('There are %d cells in the image.'%n_centers)
There are 874 cells in the image.
One possible solution would be to read the image using OpenCV, get its grayscale, then use Canny edge detection and perform countour finding in OpenCV. This will return a list of countours. It would look something like:
import cv2
image = cv2.imread('path-to-your-image')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# tweak the parameters of the GaussianBlur for best performance
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
# again, try different values here
edged = cv2.Canny(blurred, 20, 140)
(_, contours, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))
If you have all images like this - consider thresholding it, not necessarily by auto threshold-seeking algorithm like Otsu, but rather using simplest threshold by a given threshold value. Yes, before thresholding you have to convert your color input to gray-scale, or take one of color channels. Then based on few experiments with channels and threshold values - determine threshold value to have circles with holes in monochrome thresholding result. Based on your png image I found value of 81 (intensity of gray varies from 0 to 255) to be great to threshold gray-scale version of your input to have such binary image with holes in place, as described above.
Then simply count those holes.
Holes can be determined by seed-filling white area, connected to image border. As result you will have white hole connected components on black background - so simply count them.
More details you can find here http://www.leptonica.com/filling.html and use leptonica primitives to do thresholding, hole counting an so on.
From pictures of tools on a sheet of paper, I'm asked to find their outline contour to vectorize them.
I'm a total beginner in computer-vision-related problems and the only thing I thought about was OpenCV and edge detection.
The result is better than what I've imagined, this is still very unreliable, especially if the source picture isn't "perfect".
I took 2 photographies of a wrench they gave me.
After playing around with opencv bindings for node, I get this:
Then, I've tried with the less-good picture:
That's totally inexploitable.
I can get something a little better by changing the Canny thresold, but that must be automatized (given that the picture is relatively correct).
So I've got a few questions:
Am I taking the right approach? Is GrabCut better for this? A combination of Grabcut and Canny edge detection? I still need vertices at the end, but I feel that GrabCut does what I want too.
The borders are rough and have certain errors. I can augment approxPolyDP's multiplier, but without a loss of precision on good parts.
Related to the above point, I'm thinking of integrating Savitzky-Golay algorithm to smooth the outline, instead of polygon simplification with approxPolyDP. Is it a good idea?
Normally, the line of the outer border must form a simple, cuttable block. Is there a way in OpenCL to avoid that line to do impossible things, like passing on itself? - Or, simply, detect the problem? Those configurations are, of course, impossible but happen when the detection is failed (like in the second pic).
I'm searching a way to do automatic Canny thresold calculation, since I must tweak it manually for each image. Do you have a good example for that?
I noticed that converting the image to grayscale before edge detection sometimes deteriorates the result, and sometimes makes it better. Which one should I choose? (tools can be of any color btw!)
here is the source for my tests:
const cv = require('opencv');
const lowThresh = 90;
const highThresh = 90;
const nIters = 1;
const GRAY = [120, 120, 120];
const WHITE = [255, 255, 255];
cv.readImage('./files/viv1.jpg', function(err, im) {
if (err) throw err;
width = im.width()
height = im.height()
if (width < 1 || height < 1) throw new Error('Image has no size');
const out = new cv.Matrix(height, width);
im.convertGrayscale();
im_canny = im.copy();
im_canny.canny(lowThresh, highThresh);
im_canny.dilate(nIters);
contours = im_canny.findContours();
let maxArea = 0;
let biggestContour;
for (i = 0; i < contours.size(); i++) {
const area = contours.area(i);
if (area > maxArea) {
maxArea = area;
biggestContour = i;
}
out.drawContour(contours, i, GRAY);
}
const arcLength = contours.arcLength(biggestContour, true);
contours.approxPolyDP(biggestContour, 0.001 * arcLength, true);
out.drawContour(contours, biggestContour, WHITE, 5);
out.save('./tmp/out.png');
console.log('Image saved to ./tmp/out.png');
});
You'll need to add some pre-processing to clean up the image. Because you have a large variation in intensities in the image because of shadow, poor lighting, high shine on tools, etc you should equalize the image. This will help you get a better response in the regions that are currently poorly lit or have high shine.
Here's an opencv tutorial on Histogram equalization in C++: http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/histogram_equalization/histogram_equalization.html
Hope this helps
EDIT:
You can have an automatic threshold based on some loss function(?). For eg: If you know that the tool will be completely captured in the frame, you know that you should get a high value at every column from x = 10 to x = 800(say). You could then keep reducing the threshold until you get a high value at every column from x = 10 to x = 800. This is a very naive way of doing it, but its an interesting experiment, I think, since you are generating the images yourself and have control over object placement.
You might also try running your images through an adaptive threshold first. This type of binarization is fairly adept at segmenting foreground and background in cases like this, even with inconsistent lighting/shadows (which seems to be the issue in your second example above). Adathresh will require some parameter fine-tuning, but once the entire tool is segmented from the background, Canny edge detection should produce more consistent results.
As for the roughness in your contours, you could try setting your findContours mode to one of the CV_CHAIN_APPROX methods described here.
I found on the internet that laplacian method is quite good technique to compute the sharpness of a image. I was trying to implement it in opencv 2.4.10. How can I get the sharpness measure after applying the Laplacian function? Below is the code:
Mat src_gray, dst;
int kernel_size = 3;
int scale = 1;
int delta = 0;
int ddepth = CV_16S;
GaussianBlur( src, src, Size(3,3), 0, 0, BORDER_DEFAULT );
/// Convert the image to grayscale
cvtColor( src, src_gray, CV_RGB2GRAY );
/// Apply Laplace function
Mat abs_dst;
Laplacian( src_gray, dst, ddepth, kernel_size, scale, delta, BORDER_DEFAULT );
//compute sharpness
??
Can someone please guide me on this?
Possible duplicate of: Is there a way to detect if an image is blurry?
so your focus measure is:
cv::Laplacian(src_gray, dst, CV_64F);
cv::Scalar mu, sigma;
cv::meanStdDev(dst, mu, sigma);
double focusMeasure = sigma.val[0] * sigma.val[0];
Edit #1:
Okay, so a well focused image is expected to have sharper edges, so the use of image gradients are instrumental in order to determine a reliable focus measure. Given an image gradient, the focus measure pools the data at each point as an unique value.
The use of second derivatives is one technique for passing the high spatial frequencies, which are associated with sharp edges. As a second derivative operator we use the Laplacian operator, that is approximated using the mask:
To pool the data at each point, we use two methods. The first one is the sum of all the absolute values, driving to the following focus measure:
where L(m, n) is the convolution of the input image I(m, n) with the mask L. The second method calculates the variance of the absolute values, providing a new focus measure given by:
where L overline is the mean of absolute values.
Read the article
J.L. Pech-Pacheco, G. Cristobal, J. Chamorro-Martinez, J.
Fernandez-Valdivia, "Diatom autofocusing in brightfield microscopy: a
comparative study", 15th International Conference on Pattern
Recognition, 2000. (Volume:3 )
for more information.
Not exactly the answer, but I got a formula using an intuitive approach that worked on the wild.
I'm currently working in a script to detect multiple faces in a picture with a crowd, using mtcnn , which it worked very well, however it also detected many faces so blurry that you couldn't say it was properly a face.
Example image:
Faces detected:
Matrix of detected faces:
mtcnn detected about 123 faces, however many of them had little resemblance as a face. In fact, many faces look more like a stain than anything else...
So I was looking a way of 'filtering' those blurry faces. I tried the Laplacian filter and FFT way of filtering I found on this answer , however I had inconsistent results and poor filtering results.
I turned my research in computer vision topics, and finally tried to implement an 'intuitive' way of filtering using the following principle:
When more blurry is an image, less 'edges' we have
If we compare a crisp image with a blurred version of the same image, the results tends to 'soften' any edges or adjacent contrasting regions. Based on that principle, I was finding a way of weighting edges and then a simple way of 'measuring' the results to get a confidence value.
I took advantage of Canny detection in OpenCV and then apply a mean value of the result (Python):
def getBlurValue(image):
canny = cv2.Canny(image, 50,250)
return np.mean(canny)
Canny return 2x2 array same image size . I selected threshold 50,250 but it can be changed depending of your image and scenario.
Then I got the average value of the canny result, (definitively a formula to be improved if you know what you're doing).
When an image is blurred the result will get a value tending to zero, while crisp image tend to be a positive value, higher when crisper is the image.
This value depend on the images and threshold, so it is not a universal solution for every scenario, however a best value can be achieved normalizing the result and averaging all the faces (I need more work on that subject).
In the example, the values are in the range 0-27.
I averaged all faces and I got about a 3.7 value of blur
If I filter images above 3.7:
So I kept with mosth crisp faces:
That consistently gave me better results than the other tests.
Ok, you got me. This is a tricky way of detecting a blurriness values inside the same image space. But I hope people can take advantage of this findings and apply what I learned in its own projects.
I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.
What is a fast and reliable way to threshold images with possible blurring and non-uniform brightness?
Example (blurring but uniform brightness):
Because the image is not guaranteed to have uniform brightness, it's not feasible to use a fixed threshold. An adaptive threshold works alright, but because of the blurriness it creates breaks and distortions in the features (here, the important features are the Sudoku digits):
I've also tried using Histogram Equalization (using OpenCV's equalizeHist function). It increases contrast without reducing differences in brightness.
The best solution I've found is to divide the image by its morphological closing (credit to this post) to make the brightness uniform, then renormalize, then use a fixed threshold (using Otsu's algorithm to pick the optimal threshold level):
Here is code for this in OpenCV for Android:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
Mat closed = new Mat(); // closed will have type CV_32F
Imgproc.morphologyEx(image, closed, Imgproc.MORPH_CLOSE, kernel);
Core.divide(image, closed, closed, 1, CvType.CV_32F);
Core.normalize(closed, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255, Imgproc.THRESH_BINARY_INV
+Imgproc.THRESH_OTSU);
This works great but the closing operation is very slow. Reducing the size of the structuring element increases speed but reduces accuracy.
Edit: based on DCS's suggestion I tried using a high-pass filter. I chose the Laplacian filter, but I would expect similar results with Sobel and Scharr filters. The filter picks up high-frequency noise in the areas which do not contain features, and suffers from similar distortion to the adaptive threshold due to blurring. it also takes about as long as the closing operation. Here is an example with a 15x15 filter:
Edit 2: Based on AruniRC's answer, I used Canny edge detection on the image with the suggested parameters:
double mean = Core.mean(image).val[0];
Imgproc.Canny(image, image, 0.66*mean, 1.33*mean);
I'm not sure how to reliably automatically fine-tune the parameters to get connected digits.
Using Vaughn Cato and Theraot's suggestions, I scaled down the image before closing it, then scaled the closed image up to regular size. I also reduced the kernel size proportionately.
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(5,5));
Mat temp = new Mat();
Imgproc.resize(image, temp, new Size(image.cols()/4, image.rows()/4));
Imgproc.morphologyEx(temp, temp, Imgproc.MORPH_CLOSE, kernel);
Imgproc.resize(temp, temp, new Size(image.cols(), image.rows()));
Core.divide(image, temp, temp, 1, CvType.CV_32F); // temp will now have type CV_32F
Core.normalize(temp, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255,
Imgproc.THRESH_BINARY_INV+Imgproc.THRESH_OTSU);
The image below shows the results side-by-side for 3 different methods:
Left - regular size closing (432 pixels), size 19 kernel
Middle - half-size closing (216 pixels), size 9 kernel
Right - quarter-size closing (108 pixels), size 5 kernel
The image quality deteriorates as the size of the image used for closing gets smaller, but the deterioration isn't significant enough to affect feature recognition algorithms. The speed increases slightly more than 16-fold for the quarter-size closing, even with the resizing, which suggests that closing time is roughly proportional to the number of pixels in the image.
Any suggestions on how to further improve upon this idea (either by further reducing the speed, or reducing the deterioration in image quality) are very welcome.
Alternative approach:
Assuming your intention is to have the numerals to be clearly binarized ... shift your focus to components instead of the whole image.
Here's a pretty easy approach:
Do a Canny edgemap on the image. First try it with parameters to Canny function in the range of the low threshold to 0.66*[mean value] and the high threshold to 1.33*[mean value]. (meaning the mean of the greylevel values).
You would need to fiddle with the parameters a bit to get an image where the major components/numerals are visible clearly as separate components. Near perfect would be good enough at this stage.
Considering each Canny edge as a connected component (i.e. use the cvFindContours() or its C++ counterpart, whichever) one can estimate the foreground and background greylevels and reach a threshold.
For the last bit, do take a look at sections 2. and 3. of this paper. Skipping most of the non-essential theoretical parts it shouldn't be too difficult to have it implemented in OpenCV.
Hope this helped!
Edit 1:
Based on the Canny edge thresholds here's a very rough idea just sufficient to fine-tune the values. The high_threshold controls how strong an edge must be before it is detected. Basically, an edge must have gradient magnitude greater than high_threshold to be detected in the first place. So this does the initial detection of edges.
Now, the low_threshold deals with connecting nearby edges. It controls how much nearby disconnected edges will get combined together into a single edge. For a better idea, read "Step 6" of this webpage. Try setting a very small low_threshold and see how things come about. You could discard that 0.66*[mean value] thing if it doesn't work on these images - its just a rule of thumb anyway.
We use Bradleys algorithm for very similar problem (to segment letters from background, with uneven light and uneven background color), described here: http://people.scs.carleton.ca:8008/~roth/iit-publications-iti/docs/gerh-50002.pdf, C# code here: http://code.google.com/p/aforge/source/browse/trunk/Sources/Imaging/Filters/Adaptive+Binarization/BradleyLocalThresholding.cs?r=1360. It works on integral image, which can be calculated using integral function of OpenCV. It is very reliable and fast, but itself is not implemented in OpenCV, but is easy to port.
Another option is adaptiveThreshold method in openCV, but we did not give it a try: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold. The MEAN version is the same as bradleys, except that it uses a constant to modify the mean value instead of a percentage, which I think is better.
Also, good article is here: https://dsp.stackexchange.com/a/2504
You could try working on a per-tile basis if you know you have a good crop of the grid. Working on 9 subimages rather than the whole pic will most likely lead to more uniform brightness on each subimage. If your cropping is perfect you could even try going for each digit cell individually; but it all depends on how reliable is your crop.
Ellipse shape is complex to calculate if compared to a flat shape.
Try to change:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
to:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(19,19));
can speed up your enough solution with low impact to accuracy.