Rough edged text after applying Otsu's threshold for text extraction - image-processing

I am trying to separate text from background by using Otsu's threshold mechanism. Even though the algorithm separates text from background, the resultant text has rough edges, which in turn decreases the accuracy of text recognition.
The input image and the output image after applying threshold are given below:
What can I do to remove just the background? I want to retain the text as it is in the original image with clear-cut edges and no breaks or thinning.

You would get better results using a local threshold operation instead of a global one like Otsu.
But you should not expect too much. Smooth looking edges are the result of gradient transitions between forground and background. You will most likely have pixels of the same value that you would consider foreground and others you would consider background in the same character...
If you want better results you should improve the quality of your input image.

Related

Background Subtraction in OpenCV

I am trying to subtract two images using absdiff function ,to extract moving object, it works good but sometimes background appears in front of foreground.
This actually happens when the background and foreground colors are similar,Is there any solution to overcome this problem?
It may be description of the problem above not enough; so I attach images in the following
link .
Thanks..
You can use some pre-processing techniques like edge detection and some contrast stretching algorithm, which will give you some extra information for subtracting the image. Since color is same but new object should have texture feature like edge; if the edge gets preserved properly then when performing image subtraction you will obtain the object.
Process flow:
Use edge detection algorithm.
Contrast stretching algorithm(like histogram stretching).
Use the detected edge top of the contrast stretched image.
Now use the image subtraction algorithm from OpenCV.
There isn't enough information to formulate a complete solution to your problem but there are some tips I can offer:
First, prefilter the input and background images using a strong
median (or gaussian) filter. This will make your results much more
robust to image noise and confusion from minor, non-essential detail
(like the horizontal lines of your background image). Unless you want
to detect a single moving strand of hair, you don't need to process
the raw pixels.
Next, take the advice offered in the comments to test all 3 color
channels as opposed to going straight to grayscale.
Then create a grayscale image from the the max of the 3 absdiffs done
on each channel.
Then perform your closing and opening procedure.
I don't know your requirements so I can't take them into account. If accuracy is of the utmost importance. I'd use the median filter on input image over gaussian. If speed is an issue I'd scale down the input images for processing by at least half, then scale the result up again. If the camera is in a fixed position and you have a pre-calibrated background, then the current naive difference method should work. If the system has to determine movement from a real world environment over an extended period of time (moving shadows, plants, vehicles, weather, etc) then a rolling average (or gaussian) background model will work better. If the camera is moving you will need to do a lot more processing, probably some optical flow and/or fourier transform tests. All of these things need to be considered to provide the best solution for the application.

How can I quickly and reliably estimate blur severity in a photo of a document?

Suppose I have a 20 MP photo of some document, containing printed or handwritten text. The text, of course, and background, can be mildly distorted by shadows, halo from flash lightning or a lamp, etc.
I want to estimate the blur in the top half and in the bottom half of the image. Since I know that printed (and hopefully, handwritten) text is much too sharp to detect in general-purpose camera resolutions/settings, I assume text-to-background boundaries are infinitely sharp. I am thinking of detecting the minimum number (or 1st percentile) of pixels that form the boundary between minBrightness+5% (text color) and maxBrightness-5% inside a local brightness window - because the dynamic range and lightning conditions change in different localities of the photo. So, if I need at best 3 pixels to cross from BlackPoint to WhitePoint, I would infer that my blur size is roughly 2 pixels.
There are a few problems with my idea. The algorithm I am thinking of seems way slower than a filter. It could give misleading results if I run it on a region that has no text at all (e.g. a document, whose lower half is entirely blank), and so it relies on hardcoded minimum dynamic range (e.g. if maxBrightness-minBrightness < 100, there is no text; do not try to estimate blur). Thirdly, the algorithm I have does not seem very robust in regards to noise, shadows, etc, and it could fail if the actual text font is not black-white, but is grayscale for aesthetic purposes.
Given my concerns, is there a fast and robust, moderately accurate algorithm that will do the task better than the algorithm I have in mind?
PS for now I am assuming uniform blur as opposed to directional blur because the direction of the blur is not central to my task.
Since your text should be sharp, it seems like a general "in focus" or blur detector might work. Something like: Is there a way to detect if an image is blurry? and Detection of Blur in Images/Video sequences applied to sections of your image.

How to improve Grabcut in the case when the bottom part of the image isn't a part of the background?

When I'm running Grabcut on an image,
I set the bounding box of grabcut to the edges of the image itself.
However, since the bottom of the image is a part of the foreground not background, it cuts out the lower part.
Is there any way of preventing this, such as setting the boundary box only to the top, left and right?
GrabCut needs the boundary to define what is "outside" so it can compute a background color model. Depending on your API/interface you might be able to define "outside" only from the right-top and left parts of the image, leaving the bottom "inside".
Assuming you are using cv::grabCut, you may define the initial rect to exceed the bottom part of the image, in that case the algorithm should not consider the lower part to be "obvious background".
As the background in this image is quite clean, you may try some rules to segment the foreground in this case. For examples, why don't you remove the “relatively” white regions. The color of background can be heuristically extracted from the boundary region of images.
If there is cluttered background in your data set, you may try to detect a tighter bounding box for the human by using detectors, such as DPM, or R-CNN. They already provide some powerful models for human detection. Based on the detected box, I suppose the result of grabcut should be better.
Anyway, it should be helpful to provide some more examples for analysis.
classic gab cut implemented in openCV has 6 Gaussian mixture models corresponding to 6 centroids in 3D color space - more than you need to model the image above. Thus your problem is in setting correct labels. You have to set not only bounding box or ROI but 3 regions with 3 labels - FG, BG, PROBABLY_BG (partition the image). The former two will contribute to the models of foreground and background on the basis of which the latter label will be the only region where the boundary is refined. In other words, you did not do correct initialization of grab cut with three labels. See this, for example How to set a mask image for grabCut in OpenCV?

Alternative for Threshold in opencv

I am using threshold in Opencv to find the contours. My input is a hand image. Sometimes the threshold is not good so I couldnt find the contours.
I have applied the below preprocessing steps
1. Grabcut
cv::grabCut(image, result,rectangle,bgModel,fgModel, 3,cv::GC_INIT_WITH_RECT);
gray Scale conversion
cvtColor(handMat, handMat, CV_BGR2GRAY);
meadianblur
medianBlur(handMat, handMat, MEDIAN_BLUR_K);
I used the below code to find threshold
threshold( handMat, handMat, 141, 255, THRESH_BINARY||CV_THRESH_OTSU );
Sometimes I get good output and sometimes the threshold output is not good. I have attached the two output images.
Is there any other way than threshold from which contours can be found?
Good threshold Output:
Bad threshold Output
Have you tried an adaptive threshold? A single value of threshold rarely works in real life application. Another truism - threshold is a non-linear operation and hence non-stable. Gradient on the other hand is linear so you may want to find a contour by tracking the gradient if your background is smooth and solid color. Gradient is also more reliable during illumination changes or shadows than thresholding.
Grab-cut, by the way, uses color information to improve segmentation on the boundary when you already found 90% or so of the segment, so it is a post processing step. Also your initialization of grab cut with rectangle lets in a lot of contamination from background colors. Instead of rectangle use a mask where you mark as GC_FGD deep inside your initial segment where you are sure the hand is; mark as GC_BGD far outside your segment where you sure background is; mark GC_PR_FGD or probably foreground everywhere else - this is what will be refined by grab cut. to sum up - your initialization of grab cut will look like a russian doll with three layers indicating foreground (gray), probably foreground (white) and background (balck). You can use dilate and erode to create these layers, see below
Overall my suggestion is to define what you want to do first. Are you looking for contours of arbitrary objects on arbitrary moving background? If you are looking for a contour of a hand to find fingers on relatively uniform background I would:
1. use connected components or MSER to segment out a hand. Possibly improve results with grab cut initialized with the conservative mask and not rectangle!
2. use convexity defects to find fingers if this is your goal;
One issue is to try to find contours without binarizing the image.
If your input is in color, you can try to change color space in order to enhance the difference between the hand and the background.
Otsu try to find an optimal threshold, you can also try to set it manually but Otsu is useful because if the illumination change, the threshold will adapt automatically.
There are also many other kind of binarization : Sauvola, Bradley, Niblack, Kasar... but Otsu is simple, and work well. I suggest you to do preprocessing or postprocessing if you want to improve the binarization result.

uneven illuminated images

How to get rid of uneven illumination from images, that contain text data, usually printed but may be handwritten? It can have some spots of lights because the light reflected while making picture.
I've seen the Halcon program's segment_characters function that is doing this work perfectly,
but it is not open source.
I wish to convert an image to the image that has a constant illumination at background and more dark colored regions of text. So that binarization will be easy and without noise.
The text is assumed to be dark colored than it's background.
Any ideas?
Strictly speaking, assuming you have access to the image's pixels (you can search online for how to accomplish this in your programming language as the topic is abundantly available), the exercise involves going over the pixels once to determine a "darkness threshold". In order to do this you convert each pixel from RGB to HSL in order to get the lightness level component for each pixel. During this process you calculate an average lightness for the whole image which you can use as your "darkness threshold"
Once you have the image average lightness level, you can go over the image pixels once more and if a pixel is less than the darkness threshold, set it's color to full white RGB(255,255,255), otherwise, set it's color to full black RGB (0,0,0). This will give you a binary image with in which the text should be black - the rest should be white.
Of course, the key is in finding the appropriate darkness threshold - so if the average method doesn't give you good results you may have to come up with a different method to augment that step. Such a method could involve separating the image in the primary channels Red, Green, Blue and computing the darkness threshold for each channel separately and then using the aggressive threshold of the three..
And lastly, a better approach may be to compute the light levels distribution - as opposed to simply the average - and then from that, the range around the maximum is what you want to keep. Again, go over each pixel and if it's lightness fits the band make it black, otherwise, make it white.
EDIT
For further reading about HSL I recommend starting with the Wiky entry on HSL and HSV Color spaces.
Have you tried using morphological techniques? Closure-by-reconstruction (as presented in Gonzalez, Woods and Eddins) can be used to create a grayscale representation of background illumination levels. You can more-or-less standardize the effective illumination by:
1) Calculating the mean intensity of all the pixels in the image
2) Using closure-by-reconstruction to estimate background illumination levels
3) Subtract the output of (2) from the original image
4) Adding the mean intensity from (1) to every pixel in the output of (3).
Basically what closure-by-reconstruction does is remove all image features that are smaller than a certain size, erasing the "foreground" (the text you want to capture) and leaving only the "background" (illumination levels) behind. Subtracting the result from the original image leaves behind only small-scale deviations (the text). Adding the original average intensity to those deviations is simply to make the text readable, so that the resulting picture looks like a light-normalized version of the original image.
Use Local-Thresholding instead of the global thresholding algorithm.
Divide your image(grayscale) in to a grid of smaller images (say 50x50 px) and apply the thresholding algorithm on each individual image.
If the background features are generally larger than the letters, you can try to estimate and subsequently remove the background.
There are many ways to do that, a very simple one would be to run a median filter on your image. You want the filter window to be large enough that text inside the window rarely makes up more than a third of the pixels, but small enough that there are several windows that fit into the bright spots. This filter should result in an image without text, but with background only. Subtract that from the original, and you should have an image that can be segmented with a global threshold.
Note that if the bright spots are much smaller than the text, you do the inverse: choose the filter window such that it removes the light only.
The first thing you need to try and do it change the lighting, use a dome light or some other light that will give you a more diffuse and even light.
If that's not possible, you can try some of the ideas in this question or this one. You want to implement some type of "adaptive threshold", this will apply a local threshold to individual parts of the image so that the change in contrast won't be as noticable.
There is also a simple but effective method explained here. The simple outline of the alrithm is the following:
Split the image up into NxN regions or neighbourhoods
Calculate the mean or median pixel value for the neighbourhood
Threshold the region based on the value calculated in 2) or the value from 2) minus C (where C is a chosen constant)
It seems like what you're trying to do is improve local contrast while attenuating larger scale lighting variations. I'll agree with other posters that optimizing the image through better lighting should always be the first move.
After that, here are two tricks.
1) Use smooth_image() operator to convolve a gaussian on your original image. Use a relaitively large kernel, like 20-50px. Then subtract this blurred image from your original image. Apply scale and offset within sub_image() operator, or use equ_histo() to equalize histogram.
This basically subtracts the low spatial frequency information from the original, leaving the higher frequency information intact.
2) You could try highpass_image() operator, or one of the laplacian operators to extract a gradiant image.

Resources