I have a large number of grayscale images that show bright "fibers" on a darker background. I am trying to quantify the "amount" of fibers. Since they overlap almost everywhere it will be impossible to count the number of fibers, so instead I want to resort to simply calculating how large the area fraction of the white fibers is compared to the full image (e.g. this one is 55% white, another one with less fibers is only 43% white, etc). In other words, I want to quantify the density of the fibers in the image.
Example pictures:
High density: https://dl.dropboxusercontent.com/u/14309718/f1.jpg
Lower density: https://dl.dropboxusercontent.com/u/14309718/f2.jpg
I figured a simple (adaptive) threshold filter would do the job nicely by just converting the image to purely black/white and then counting the fraction of white pixels. However, my answer seems to depend almost completely and only on the threshold value that I choose. I did some quick experiments by taking a large number of different thresholds and found that in all pictures the fraction of white pixels is almost exactly a linear function of the threshold value. In other words - I can get any answer I want between roughly 10% and 90% depending on the threshold I choose.
This is obviously not a good approach because my results are extremely biased with how I choose the threshold and therefore completely useless. Furthermore I have about 100 of these images and I'm not looking forward to trying to choose the "correct" threshold for all of them manually.
How can I improve this method?
As the images are complex and the outlines of the fibers are fuzzy, there is little hope of getting an "exact" measurement.
What matters then is to achieve repeatability, i.e. ensure that the same fiber density is always assigned the same measurement, even in varying lighting conditions if possible, and different densities are assigned different measurements.
This rules out human intervention in adjusting a threshold.
My best advice is to rely on Otsu thresholding, which is very good at finding meaningful background and foreground intensities and is fairly illumination-independent.
Enhancing the constrast before Otsu should be avoided because binarization commutes with contrast enhancement (so that there is no real benefit), but contrast enhancement can degrade the image by saturating at places.
Just echoing #YvesDaoust' thoughts really - and providing some concrete examples...
You can generate histograms of your images using ImageMagick which is installed on most Linux distros and is available for OSX and Windows. I am just doing this at the command-line but it is powerful and easy to run some tests and see how Yves' suggestion works for you.
# Make histograms for both images
convert 1.jpg histogram:h1.png
convert 2.jpg histogram:h2.png
Yes, they are fairly bimodal - so Otsu thresholding should find a threshold that maximises the between-class variance. Use the script otsuthresh from Fred Weinhaus' website here
./otsuthresh 1.jpg 1.gif
Thresholding Image At 44.7059%
./otsuthresh 2.jpg 2.gif
Thresholding Image At 42.7451%
Count percentage of white pixels in each image:
convert 1.gif -format "%[fx:int(mean*100)]" info:
50
convert 2.gif -format "%[fx:int(mean*100)]" info:
48
Not that brilliant a distinction! Mmmm... I tried adding in a median filter to reduce the noise, but that didn't help. Do you have your images available as PNG to avoid the nasty artefacts?
Related
I am trying to understand how to properly work with the RGB values found in PNM formats in order to inevitably convert them to Grayscale.
Researching the subject, it appears that if the RGB values are nonlinear, then I would need to first convert them to a linear RGB color space, apply my weights, and then convert them back to the same nonlinear color space.
There appears to be an expected format http://netpbm.sourceforge.net/doc/ppm.html:
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function.
So I take it these values are nonlinear, but not sRGB. I found some thread topics around ImageMagick that say they might save them as linear RGB values.
Am I correct that PNM specifies a standard, but various editors like Photoshop or GIMP may or may not follow it?
From http://netpbm.sourceforge.net/doc/pamrecolor.html
When you use this option, the input and output images are not true Netpbm images, because the Netpbm image format specifies a particular color space. Instead, you are using a variation on the format in which the sample values in the raster have different meaning. Many programs that ostensibly use Netpbm images actually use a variation with a different color space. For example, GIMP uses sRGB internally and if you have GIMP generate a Netpbm image file, it really generates a variation of the format that uses sRGB.
Else where I see this http://netpbm.sourceforge.net/doc/pgm.html:
Each gray value is a number proportional to the intensity of the
pixel, adjusted by the ITU-R Recommendation BT.709 gamma transfer
function. (That transfer function specifies a gamma number of 2.2 and
has a linear section for small intensities). A value of zero is
therefore black. A value of Maxval represents CIE D65 white and the
most intense value in the image and any other image to which the image
might be compared.
BT.709's range of channel values (16-240) is irrelevant to PGM.
Note that a common variation from the PGM format is to have the gray
value be "linear," i.e. as specified above except without the gamma
adjustment. pnmgamma takes such a PGM variant as input and produces a
true PGM as output.
Most sources out there assume they are dealing with linear RGB and just apply their weights and save, possibly not preserving the luminance. I assume that any complaint renderer will assume that these RGB values are gamma compressed... thus technically displaying different grayscale "colors" than what I had specified. Is this correct? Maybe to ask it differently, does it matter? I know it is a loaded question, but if I can't really tell if it is linear or nonlinear, or how it has been compressed or expected to be compressed, will the image processing algorithms (binarization) be greatly effected if I just assume linear RGB values?
There may have been some confusion with my question, so I would like to answer it now that I have researched the situation much further.
To make a long story short... it appears like no one really bothers to re-encode an image's gamma when saving to PNM format. Because of that, since almost everything is sRGB, it will stay sRGB as opposed to the technically correct BT.709, as per the spec.
I reached out to Bryan Henderson of NetPBM. He held the same belief and stated that the method of gamma compression is not as import as knowing if it was applied or not and that we should always assume it is applied when working with PNM color formats.
To reaffirm the effect of that opinion in regard to image processing, please read "Color-to-Grayscale: Does the Method Matter in Image Recognition?", 2012 by Kanan and Cottrell. Basically if you calculate the Mean of the RGB values you will end up in one of three situations: Gleam, Intensity', or Intensity. After comparing the effects of different grayscale conversion formulas, taking into account when and how gamma correction was applied, he discovered that Gleam and Intensity' where the best performers. They differ only by when the gamma correction was added (Gleam has the gamma correction on the input RGB values, while Intensity' takes in linear RGB and applies gamma afterwords). Sadly you drop from 1st and 2nd place down to 8th when no gamma correction is added, aka Intensity. It's interesting to note that it was the simple Mean formula that worked the best, not one of the more popular grayscale formulas most people tout. All of that to say that if you use the Mean formula for converting PNM color to grayscale for image processing applications, you will ensure great performance since we can assume some gamma compression will have been applied. My comment about ImageMagick and linear values appears only to apply to their PGM format.
I hope that helps!
There is only one way good way to convert colour signal to greyscale: going to linear space and add light (and so colour intensities). In this manner you have effective light, and so you can calculate the brightness. Then you can "gamma" correct the value. This is the way light behave (linear space), and how the brightness was measured by CIE (by wavelength).
On television it is standard to build luma and then black and white images) from non-linear R,G,B. This is done because simplicity and the way analog colour television (NTSC and PAL) worked: black and white signal (for BW television) as main signal, and then adding colours (as subcarrier) to BW image. For this reason, the calculations are done in non linear space.
Video could use often such factors (on non-linear space), because it is much quick to calculate, and you can do it easily with integers (there are special matrix to use with integers).
For edge detection algorithms, it should not be important which method you are using: we have difficulty to detect edge with similar L or Y', so we do no care if computers have similar problem.
Note: our eyes are non linear on detecting light intensities, and with similar gamma as phosphors on our old televisions. For this reason using gamma corrected value is useful: it compress the information in a optimal way (or in "analog-TV" past: it reduce perceived noise).
So you if you want Y', do with non linear R',G',B'. But if you need real grey scale, you need to calculate real greyscale going to linear space.
You may see differences especially on mid-greys, and on purple or yellow, where two of R,G,B are nearly the same (and as maximum value between the three).
But on photography programs, there are many different algorithms to convert RGB to greyscale: we do not see the world in greyscale, so different weight (possibly non linear) could help to make out some part of image, which it is the purpose of greyscale photos (by remove distracting colours).
Note Rec.709 never specified the gamma correction to apply (the OETF on the standard is not useful, we need EOTF, and often one is not the inverse of the other, for practical reasons). Only on a successive recommendation this missing information were finally provided. But because many people speak about Rec.709, the inverse of OETF is used as gamma, which it is incorrect.
How to detect: classical yellow sun on blue sky, choosing yellow and blue with same L. If you see sun in grey image, you are transforming with non-linear space (Y' is not equal). If you do no see the sun, you transform linearly.
I used Imagemagick in my project. I implemented a sub-image detection system using the compare command of ImageMagick. It is working well giving fine results. By reading articles i got to know that ImageMagick compares pixels of small image at every possible position within the pixels of larger image.And also i got to know ImageMagick detects rotated images and scaled images using Fuzzy factor.Though i have an rough idea about how the algorithm behave i couldn't find any article related to the algorithms of ImageMagick. Any idea about how this algorithm of compare command actually works?
AT my job I'm currently working on a tool which does some sophisticated optimizations on PDF pages when they're too heavy (too much vectors etc), but sometimes whites out rectangular parts of the image due to the drawing order of PDF objects.
I decided to use the compare tool of ImageMagick v6 (retrocompatibility issues forbid v7 for the moment) to check the page rendering after treatment against original rendering and detect when there have been accidents.
I have tested the available -metrics parameters on pages where the rendering was nearly identical to the original, while there was one page where white out parts were happening, making renderings very different.
I used -fuzz 10% to accept a minor color variation, and used 99 for JPG quality in my tool so JPEG compression doesn't generate too much differences. Beware on this one, as a low quality on jpg compression forces you to augment the fuzz factor, with the risk of missing major visual differences. Unfortunately, you won't find the information in the JPG headers.
I have made my images mid-res (150 dpi), because lo-res proved similar to low JPG quality and generated too much differences. The renderings are same resolution (549*819 = 449 631 pixels), the compare tool is not really strong at finding part of an image in another image. (You'd better off with OpenCV for this.)
Here is a table with some significant results on three different pages in my tool, and then my interpretation of each metric.
The AE stands for absolute error count. This metric roughly gives the number of pixels considered as different, within the 10% fuzz acceptation that I have used. On nearly identical renderings, this value is typically very low, only 18 or 1400 pixels on a total of 450K, while the very different images show nearly 40K different pixels. I think this metric can be checked against a low percentage of total pixels, but in my case this is not distinctive enough. Say I have 1% = 4500 pixels, it could be not neglectable if they happen in only one rectangle. It could be usefull with a geographical dispersion factor but this is more an OpenCV job.
The MEPP is Mean Error Per Pixel. As with all mean statistics it's not obvious to interpret, but the results are quite distinctive. Jumping from about 6K for same images to a huge 1.7M on different images. The problem is to decide a limit value. You can see in my table I had a page with 470K on this metric but it was a visually acceptable rendering artifact. What is an acceptable value, then? As often with mean values, significancy can prove to be very arbitrary and not always appropriate. The only way to find a reasonnable limit is to make a lot of measures on significant cases, maybe using machine learing.
MSE metric is mean error squared, average of the channel error squared. Squared delta values are often more significant in statistics, because they lower the minor differences and accentuate the major ones. (Linear regression correlation factor benefits from this mathematical behaviour.) This metric proves very interesting in my case because values are very consistent for similar cases : even on the page with a visual artifact and 450K different pixels, the value stays well below 1, while the page with whited out parts jumps to 9. This MSE metric is clearly very useable in my case.
NCC means normalized cross correlation. Normalized values are not appropriate in my case, as they make colors closer and diminish the differences between different cases. You can see this on values, although identical rendering are remarquably close to 1, different renderings have a value of 0.86 which is not so far from 1.
PAE gives you the absolute peak difference on color channels on all pixels, so it doesn't tell how many pixels are different. And it doesn't work well for RGB or CMYK as it doesn't tell you which channels are concerned. It's no use in my case.
PHASH is perceptual hash for the sRGB and HCLp colorspaces. I'm not really sure about this one but the result are very interesting as my images are RGB, and mostly because the values on similar rendering are well below 1 even for the page with acceptable visual artifact - while the different renderings give a value of 26. This is very appropriate for my case, just as MSE. I consider this metric as it seems to even more accentuate the differences between minor and major gaps in color values.
PSNR is peak signal to noise ratio : as for PAE, this is a peak value which won't tell how many pixels are concerned so it is absolutely unseable in my case.
RMSE is for root mean squared. Whatever that means, you find the same squared means behaviour as with MSE, so this value can be interesting as it reveals major differences more than minor ones. Here my values are 0.7 and 3.2 for identical images and a hefty 48.7 for different ones. The difficulty is to decide for an acceptable limit: as with AE or MEPP, the only way is to conduct a lot of tests and measure values for a lot of cases and decide for an appropriate value. Machine learning could help for this.
As a conclusion, I decided to use the PHASH metric. But the very conclusion of this study is that you must conduct tests and measure them before deciding the metric to use, as comparison contextes can be very different and metrics can show very different behaviours.
OpenCV is way more appropriate if you have to compare images from different sources, or parts of a global image, or images with big lighting variations. Also, it's quite easy to use from Python and C++. ImageMagick is quite good in my case because I am the one who generates both images.
The fuzz factor in ImageMagick allows two pixels to be compared and considered as the same although their colours may differ slightly.
The trick to understanding it, is to consider an RGB colour cube with Red, Green, Blue, Cyan, Magenta, Yellow and Black and White as the vertices. A fuzz factor of 100% represents the greatest possible distance in that cube, i.e. the length of the diagonal from Black to White and everything is scaled relative to that. It is shown dotted in this diagram.
In general, I would recommend using a percentage value rather than an absolute value, because an absolute fuzz factor of 255 means all colours are the same (black=white) on an 8-bit image, whereas on a 16-bit image, it would be hard to even perceive two colours that differ by 255.
As an example, let's see if a single black pixel is the same as a single mid-grey pixel with 49% fuzz:
compare -metric ae -fuzz 49% xc:black xc:gray null:
1
No, it is different, there is one pixel difference. Now let's try again allowing the pixels to be 51% different yet still match:
compare -metric ae -fuzz 51% xc:black xc:gray null:
0
Now they are considered the same.
Hey I need a 320x240 8Bit gray scale image for some Computer Vision Algorithm (Orb Feature tracking). The Raspicam driver I'm using can provide different Image Sizes. Different Image Sizes are achieved by cropping and not down sampling from the driver. As my environment is not ideal lighted the Image is quite dark and noisy. Now I had the idea to take a 640x480 Image and down sample it to 320x240 by combining always 2x2 pixels to one. Normally I would of course divide by 4 to get the correct result. But what would be the effect of dividing it by two or even one (assuming 99% of the intensity values are not bigger then 64 (256/4)). Wouldn't that simulate the effect of larger CCD cells which could gather more light in less time.
The first tests I did showed some pretty good results. Meaning I detected more Features and could follow them better between two frames.
Here, you are not taking proper average of 2x2 blocks(divide by 4). Say, you have two blocks and they have Delta-I difference in intensity. If you divide the intensity of the two blocks by a larger number, the intensity difference will reduce and vice-versa for smaller number.
When you divide the difference(Delta-I) by 2(instead of 4), you are in a way increasing the contrast(intensity difference between background and foreground. As you mentioned that your image is in poor illumination, thereby division by smaller number increases the contrast which is improving tracking. This approach will come under contrast enhancement technique and is a variation of Linear contrast enhancement.
I think this can be a stupid question but after read a lot and search a lot about image processing every example I see about image processing uses gray scale to work
I understood that gray scale images use just one channel of color, that normally is necessary just 8 bit to be represented, etc... but, why use gray scale when we have a color image? What are the advantages of a gray scale? I could imagine that is because we have less bits to treat but even today with faster computers this is necessary?
I am not sure if I was clear about my doubt, I hope someone can answer me
thank you very much
As explained by John Zhang:
luminance is by far more important in distinguishing visual features
John also gives an excellent suggestion to illustrate this property: take a given image and separate the luminance plane from the chrominance planes.
To do so you can use ImageMagick separate operator that extracts the current contents of each channel as a gray-scale image:
convert myimage.gif -colorspace YCbCr -separate sep_YCbCr_%d.gif
Here's what it gives on a sample image (top-left: original color image, top-right: luminance plane, bottom row: chrominance planes):
To elaborate a bit on deltheil's answer:
Signal to noise. For many applications of image processing, color information doesn't help us identify important edges or other features. There are exceptions. If there is an edge (a step change in pixel value) in hue that is hard to detect in a grayscale image, or if we need to identify objects of known hue (orange fruit in front of green leaves), then color information could be useful. If we don't need color, then we can consider it noise. At first it's a bit counterintuitive to "think" in grayscale, but you get used to it.
Complexity of the code. If you want to find edges based on luminance AND chrominance, you've got more work ahead of you. That additional work (and additional debugging, additional pain in supporting the software, etc.) is hard to justify if the additional color information isn't helpful for applications of interest.
For learning image processing, it's better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full color imaging and missing all the important insights that can (and should) be learned from single channel processing.
Difficulty of visualization. In grayscale images, the watershed algorithm is fairly easy to conceptualize because we can think of the two spatial dimensions and one brightness dimension as a 3D image with hills, valleys, catchment basins, ridges, etc. "Peak brightness" is just a mountain peak in our 3D visualization of the grayscale image. There are a number of algorithms for which an intuitive "physical" interpretation helps us think through a problem. In RGB, HSI, Lab, and other color spaces this sort of visualization is much harder since there are additional dimensions that the standard human brain can't visualize easily. Sure, we can think of "peak redness," but what does that mountain peak look like in an (x,y,h,s,i) space? Ouch. One workaround is to think of each color variable as an intensity image, but that leads us right back to grayscale image processing.
Color is complex. Humans perceive color and identify color with deceptive ease. If you get into the business of attempting to distinguish colors from one another, then you'll either want to (a) follow tradition and control the lighting, camera color calibration, and other factors to ensure the best results, or (b) settle down for a career-long journey into a topic that gets deeper the more you look at it, or (c) wish you could be back working with grayscale because at least then the problems seem solvable.
Speed. With modern computers, and with parallel programming, it's possible to perform simple pixel-by-pixel processing of a megapixel image in milliseconds. Facial recognition, OCR, content-aware resizing, mean shift segmentation, and other tasks can take much longer than that. Whatever processing time is required to manipulate the image or squeeze some useful data from it, most customers/users want it to go faster. If we make the hand-wavy assumption that processing a three-channel color image takes three times as long as processing a grayscale image--or maybe four times as long, since we may create a separate luminance channel--then that's not a big deal if we're processing video images on the fly and each frame can be processed in less than 1/30th or 1/25th of a second. But if we're analyzing thousands of images from a database, it's great if we can save ourselves processing time by resizing images, analyzing only portions of images, and/or eliminating color channels we don't need. Cutting processing time by a factor of three to four can mean the difference between running an 8-hour overnight test that ends before you get back to work, and having your computer's processors pegged for 24 hours straight.
Of all these, I'll emphasize the first two: make the image simpler, and reduce the amount of code you have to write.
I disagree with the implication that gray scale images are always better than color images; it depends on the technique and the overall goal of the processing. For example, if you wanted to count the bananas in an image of a fruit bowl image, then it's much easier to segment when you have a colored image!
Many images have to be in grayscale because of the measuring device used to obtain them. Think of an electron microscope. It's measuring the strength of an electron beam at various space points. An AFM is measuring the amount of resonance vibrations at various points topologically on a sample. In both cases, these tools are returning a singular value- an intensity, so they implicitly are creating a gray-scale image.
For image processing techniques based on brightness, they often can be applied sufficiently to the overall brightness (grayscale); however, there are many many instances where having a colored image is an advantage.
Binary might be too simple and it could not represent the picture character.
Color might be too much and affect the processing speed.
Thus, grayscale is chosen, which is in the mid of the two ends.
First of starting image processing whether on gray scale or color images, it is better to focus on the applications which we are applying. Unless and otherwise, if we choose one of them randomly, it will create accuracy problem in our result. For example, if I want to process image of waste bin, I prefer to choose gray scale rather than color. Because in the bin image I want only to detect the shape of bin image using optimized edge detection. I could not bother about the color of image but I want to see rectangular shape of the bin image correctly.
I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.