why we should use gray scale for image processing - image-processing

I think this can be a stupid question but after read a lot and search a lot about image processing every example I see about image processing uses gray scale to work
I understood that gray scale images use just one channel of color, that normally is necessary just 8 bit to be represented, etc... but, why use gray scale when we have a color image? What are the advantages of a gray scale? I could imagine that is because we have less bits to treat but even today with faster computers this is necessary?
I am not sure if I was clear about my doubt, I hope someone can answer me
thank you very much

As explained by John Zhang:
luminance is by far more important in distinguishing visual features
John also gives an excellent suggestion to illustrate this property: take a given image and separate the luminance plane from the chrominance planes.
To do so you can use ImageMagick separate operator that extracts the current contents of each channel as a gray-scale image:
convert myimage.gif -colorspace YCbCr -separate sep_YCbCr_%d.gif
Here's what it gives on a sample image (top-left: original color image, top-right: luminance plane, bottom row: chrominance planes):

To elaborate a bit on deltheil's answer:
Signal to noise. For many applications of image processing, color information doesn't help us identify important edges or other features. There are exceptions. If there is an edge (a step change in pixel value) in hue that is hard to detect in a grayscale image, or if we need to identify objects of known hue (orange fruit in front of green leaves), then color information could be useful. If we don't need color, then we can consider it noise. At first it's a bit counterintuitive to "think" in grayscale, but you get used to it.
Complexity of the code. If you want to find edges based on luminance AND chrominance, you've got more work ahead of you. That additional work (and additional debugging, additional pain in supporting the software, etc.) is hard to justify if the additional color information isn't helpful for applications of interest.
For learning image processing, it's better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full color imaging and missing all the important insights that can (and should) be learned from single channel processing.
Difficulty of visualization. In grayscale images, the watershed algorithm is fairly easy to conceptualize because we can think of the two spatial dimensions and one brightness dimension as a 3D image with hills, valleys, catchment basins, ridges, etc. "Peak brightness" is just a mountain peak in our 3D visualization of the grayscale image. There are a number of algorithms for which an intuitive "physical" interpretation helps us think through a problem. In RGB, HSI, Lab, and other color spaces this sort of visualization is much harder since there are additional dimensions that the standard human brain can't visualize easily. Sure, we can think of "peak redness," but what does that mountain peak look like in an (x,y,h,s,i) space? Ouch. One workaround is to think of each color variable as an intensity image, but that leads us right back to grayscale image processing.
Color is complex. Humans perceive color and identify color with deceptive ease. If you get into the business of attempting to distinguish colors from one another, then you'll either want to (a) follow tradition and control the lighting, camera color calibration, and other factors to ensure the best results, or (b) settle down for a career-long journey into a topic that gets deeper the more you look at it, or (c) wish you could be back working with grayscale because at least then the problems seem solvable.
Speed. With modern computers, and with parallel programming, it's possible to perform simple pixel-by-pixel processing of a megapixel image in milliseconds. Facial recognition, OCR, content-aware resizing, mean shift segmentation, and other tasks can take much longer than that. Whatever processing time is required to manipulate the image or squeeze some useful data from it, most customers/users want it to go faster. If we make the hand-wavy assumption that processing a three-channel color image takes three times as long as processing a grayscale image--or maybe four times as long, since we may create a separate luminance channel--then that's not a big deal if we're processing video images on the fly and each frame can be processed in less than 1/30th or 1/25th of a second. But if we're analyzing thousands of images from a database, it's great if we can save ourselves processing time by resizing images, analyzing only portions of images, and/or eliminating color channels we don't need. Cutting processing time by a factor of three to four can mean the difference between running an 8-hour overnight test that ends before you get back to work, and having your computer's processors pegged for 24 hours straight.
Of all these, I'll emphasize the first two: make the image simpler, and reduce the amount of code you have to write.

I disagree with the implication that gray scale images are always better than color images; it depends on the technique and the overall goal of the processing. For example, if you wanted to count the bananas in an image of a fruit bowl image, then it's much easier to segment when you have a colored image!
Many images have to be in grayscale because of the measuring device used to obtain them. Think of an electron microscope. It's measuring the strength of an electron beam at various space points. An AFM is measuring the amount of resonance vibrations at various points topologically on a sample. In both cases, these tools are returning a singular value- an intensity, so they implicitly are creating a gray-scale image.
For image processing techniques based on brightness, they often can be applied sufficiently to the overall brightness (grayscale); however, there are many many instances where having a colored image is an advantage.

Binary might be too simple and it could not represent the picture character.
Color might be too much and affect the processing speed.
Thus, grayscale is chosen, which is in the mid of the two ends.

First of starting image processing whether on gray scale or color images, it is better to focus on the applications which we are applying. Unless and otherwise, if we choose one of them randomly, it will create accuracy problem in our result. For example, if I want to process image of waste bin, I prefer to choose gray scale rather than color. Because in the bin image I want only to detect the shape of bin image using optimized edge detection. I could not bother about the color of image but I want to see rectangular shape of the bin image correctly.

Related

What Kind of pre-processing techniques can I apply to make an object more clear?

I applied few techniques of denoising on MRI images and could not realize what techniques are applicable on my data to make the cartilage object more clear. First I applied Contrast-limited adaptive histogram equalization (CLAHE) with this function:
J = adapthisteq(I)
But I got a white image. This is original image and manual segmentation of two thin objects(cartilage):
And then I read a paper that they had used some preprocessing on microscopy images, such as: Anisotropic diffusion filter(ADF), then, K-SVD algorithm, and then Batch-Orthogonal Matching Pursuit (OMP). I applied the first two and the output is as following:
It seems my object is not clear. It should be brighter than other objects. I do not what kind of algorithms are applicable to make the cartilage objects more clear. I really appreciate any help.
Extended:
This is the object:
Edited (now knowing exactly what you are looking for)
The differences between your cartilage and the surrounding tissue is very slight and for that reason I do not think you can afford to do any filtration. What I mean by this is that the two things that I can kinda catch with my eye is that the edge on the cartilage is very sharp (the grey to black drop-off), and also there seems to be a texture regularity in the cartilage that is smoother than the rest of the image. To be honest, these features are incredibly hard to even pick out by eye, and a common rule of thumb is that if you can't do it with your eye, vision processing is going to be rough.
I still think you want to do histogram stretching to increase your contrast.
1:In order to do a clean global contrast stretch you will need to remove bone/skin edge/ whatever that line on the left is from the image (bright white). To do this, I would suggest looking at the intensity histogram and setting a cut-off after the first peak (make sure to limit this so some value well above what cartilage could be in case there is no white signal). After determining that value, cut all pixels above that intensity from the image.
2:There appears to be low frequency gradients in this image (the background seems to vary in intensity), global histogram management (normalization) doesn't handle this well, CLAHE can handle this if set up well. But a far simpler solution worth trying is just hitting the image with a high pass filter as this will help to remove some of those (low frequency) background shifts. (after this step you should see no bulk intensity variation across the image.
3: I think you should try various implementations of histogram stretching, your goal in your histogram stretch implementation is to make the cartilage look more unique in the image compared to all other tissue.
This is by far the hardest step as you need to actually take a stab at what makes that tissue different from the rest of the tissue. I am at work, but when I get off, I will try to brainstorm some concepts for this final segmentation step here. In the meantime, what you want to try to identify is anything unique about the cartilage tissue at this point. My top ideas are cylindrical style color gradient, surface roughness, edge sharpness, location, size/shape

Find "fraction of bright pixels" in image (thresholding?)

I have a large number of grayscale images that show bright "fibers" on a darker background. I am trying to quantify the "amount" of fibers. Since they overlap almost everywhere it will be impossible to count the number of fibers, so instead I want to resort to simply calculating how large the area fraction of the white fibers is compared to the full image (e.g. this one is 55% white, another one with less fibers is only 43% white, etc). In other words, I want to quantify the density of the fibers in the image.
Example pictures:
High density: https://dl.dropboxusercontent.com/u/14309718/f1.jpg
Lower density: https://dl.dropboxusercontent.com/u/14309718/f2.jpg
I figured a simple (adaptive) threshold filter would do the job nicely by just converting the image to purely black/white and then counting the fraction of white pixels. However, my answer seems to depend almost completely and only on the threshold value that I choose. I did some quick experiments by taking a large number of different thresholds and found that in all pictures the fraction of white pixels is almost exactly a linear function of the threshold value. In other words - I can get any answer I want between roughly 10% and 90% depending on the threshold I choose.
This is obviously not a good approach because my results are extremely biased with how I choose the threshold and therefore completely useless. Furthermore I have about 100 of these images and I'm not looking forward to trying to choose the "correct" threshold for all of them manually.
How can I improve this method?
As the images are complex and the outlines of the fibers are fuzzy, there is little hope of getting an "exact" measurement.
What matters then is to achieve repeatability, i.e. ensure that the same fiber density is always assigned the same measurement, even in varying lighting conditions if possible, and different densities are assigned different measurements.
This rules out human intervention in adjusting a threshold.
My best advice is to rely on Otsu thresholding, which is very good at finding meaningful background and foreground intensities and is fairly illumination-independent.
Enhancing the constrast before Otsu should be avoided because binarization commutes with contrast enhancement (so that there is no real benefit), but contrast enhancement can degrade the image by saturating at places.
Just echoing #YvesDaoust' thoughts really - and providing some concrete examples...
You can generate histograms of your images using ImageMagick which is installed on most Linux distros and is available for OSX and Windows. I am just doing this at the command-line but it is powerful and easy to run some tests and see how Yves' suggestion works for you.
# Make histograms for both images
convert 1.jpg histogram:h1.png
convert 2.jpg histogram:h2.png
Yes, they are fairly bimodal - so Otsu thresholding should find a threshold that maximises the between-class variance. Use the script otsuthresh from Fred Weinhaus' website here
./otsuthresh 1.jpg 1.gif
Thresholding Image At 44.7059%
./otsuthresh 2.jpg 2.gif
Thresholding Image At 42.7451%
Count percentage of white pixels in each image:
convert 1.gif -format "%[fx:int(mean*100)]" info:
50
convert 2.gif -format "%[fx:int(mean*100)]" info:
48
Not that brilliant a distinction! Mmmm... I tried adding in a median filter to reduce the noise, but that didn't help. Do you have your images available as PNG to avoid the nasty artefacts?

Matlab image processing - replacing dark pixels with neighboring pixels

I am doing some image processing of the retina images.. I need to replace the blood vessels with background pixels so that I can focus on other aspects of retina. I could not figure out a way to do this. I am using matlab. any suggestions?
Having worked extensively with retinal images, I can tell you that what you're proposing is a complex problem in itself. Sure, if you just want a crude method, you can use imdilate. But that will affect your entire image, and other structures in the image will change appearance. Something, that is not desirable.
However, if you want to do it properly, you will first need to segment all the blood vessels and create a binary mask. Once you have a binary mask, it's up to you how to fill up the vessel regions. You can either interpolate from the boundaries or calculate a background image and replace the vessel regions with pixels from the background image, etc.
Segmentation of the blood vessels is a challenging problem and you will find a lot of literature concerning that on the internet. Ultimately, you will have to choose how accurate a segmentation you want and build your algorithm accordingly.
imdilate should do what you want, since it replaces each pixel with the maximum of its neighbors. For more detailed suggestions, I'd need to see images.

Is HSL Superior over HSI and HSV Color Spaces?

Is HSL superior over HSI and HSV, because it takes human perception into account.?
For some image processing algorithms they say I can use either of these color spaces,
and I am not sure which one to pick. I mean, the algorithms just care that you provide
them with hue and saturation channel, you can pick which color space to use
Which one is best very much depends on what you're using it for. But in my experience HSL (HLS) has an unfortunate interaction between brightness and saturation.
Here's an example of reducing image brightness by 2. The leftmost image is the original; next comes the results using RGB, HLS, and HSV:
Notice the overly bright and saturated spots around the edge of the butterfly in HLS, particularly that red spot at the bottom. This is the saturation problem I was referring to.
This example was created in Python using the colorsys module for the conversions.
Since there is no accepted answer yet, and since I had to further research to fully understand this, I'll add my two cents.
Like others have said the answer as to which of HSL or HSV is better depends on what you're trying to model and manipulate.
tl;dr - HSV is only "better" than HSL for machine vision (with caveats, read below). "Lab" and other formal color models are far more accurate (but computationally expensive) and should really be used for more serious work. HSL is outright better for "paint" applications or any other where you need a human to "set", "enter" or otherwise understand/make sense of a color value.
For details, read below:
If you're trying to model how colours are GENERATED, the most intuitive model is HSL since it maps almost directly to how you'd mix paints to create colors. For example, to create "dark" yellow, you'd mix your base yellow paint with a bit of black. Whereas to create a lighter shade of yellow, you'd mix a bit of white.
Values between 50 and 0 in the "L" spectrum in HSL map to how much "black" has to be mixed in (black increasing from 0 to 100%, as L DECREASES from 50 to 0).
Values between 50 and 100 map to how much "white" has to be mixed in (white varying from 0 to 100% as L increases from 50 to 100%).
50% "L" gives you the "purest" form of the color without any "contamination" from white or black.
Insights from the below links:
1. http://forums.getpaint.net/index.php?/topic/22745-hsl-instead-of-hsv/
The last post there.
2. http://en.wikipedia.org/wiki/HSL_and_HSV
Inspect the color-space cylinder for HSL - it gives a very clear idea of the kind of distribution I've talked about.
Plus, if you've dealt with paints at any point, the above explanation will (hopefully) make sense. :)
Thus HSL is a very intuitive way of understanding how to "generate" a color - thus it's a great model for paint applications, or any other applications that are targeted to an audience used to thinking in "shade"/"tone" terms for color.
Now, onto HSV.
This is treacherous territory now as we get into a space based on a theory I HAVE FORMULATED to understand HSV and is not validated or corroborated by other sources.
In my view, the "V" in HSV maps to the quantity of light thrown at an object, with the assumption, that with zero light, the object would be completely dark, and with 100% light, it would be all white.
Thus, in this image of an apple, the point that is directly facing the light source is all white, and most likely has a "V" at 100% whereas the point at the bottom that is completely in shadow and untouched by light, has a value "0". (I haven't checked these values, just thought they'd be useful for explanation).
Thus HSV seems to model how objects are lit (and therefore account for any compensation you might have to perform for specular highlights or shadows in a machine vision application) BETTER than HSL.
But as you can see quite plainly from the examples in the "disadvantages" section in the Wikipedia article I linked to, neither of these methods are perfect. "Lab" and other more formal (and computationally expensive) color models do a far better job.
P.S: Hope this helps someone.
The only color space that has advantage and takes human perception into account is LAB, in the sense that the Euclidian metric in it is correlated with human color differentiation.
Taken directly from Wikipedia:
Unlike the RGB and CMYK color models, Lab color is designed to
approximate human vision. It aspires to perceptual uniformity, and its
L component closely matches human perception of lightness
That is the reason that many computer vision algorithms are taking advantage of LAB space
HSV, HSB and HSI don't have this property. So the answer is no, HSL is not "superior" over HSI and HSV in the sense of human perception.
If you want to be close to human perception, try LAB color space.
I would say that one is NO better than another, each is just a mathematical conversion of another. Differing representations CAN make manipulation of an image for the effect you wish a bit easier. Each person WILL perceive images a bit differently, and using HSI or HSV may provide a small difference in output image.
Even RGB when considered against a system (i.e. with pixel array) takes into account human perception. When an imager (with a bayer overlay) takes a picture, there are 2 green pixels for every 1 red and blue pixel. Monitors still output in RGB (although most only have a single green pixel for each red and blue). A new TV monitor made by Sharp now has a yellow output pixel. The reason they have done this is due to there being a yellow band in the actual frequency spectrum, so to better truly represent color, they have added a yellow band (or pixel).
All of these things are based on the human eye having a greater sensitivity to green over any other color in the spectrum.
Regardless, whatever scale you use, the image will be transformed back to RGB to be displayed on screen.
http://hyperphysics.phy-astr.gsu.edu/hbase/vision/colcon.html
http://www.physicsclassroom.com/class/light/u12l2b.cfm
In short, I dont think any one is better than another, just different representations.
http://en.wikipedia.org/wiki/Color
Imma throw my two cents in here being both a programmer and also a guy who aced Color Theory in art school before moving on to software engineering career wise.
HSL/HSV are great for easily writing programmatic functionality to handle color without dealing with a ton of edge cases. They are terrible at replicating human perception of color accurately.
CMYK is great for rendering print stuff, because it approximates the pigments that printers rely on. It is also terrible at replicating human perception of color accurately (although not because it's bad per se, but more because computers are really bad at displaying it on a screen. More on that in a minute).
RGB is the only color utility represented in tech that accurately reflects human vision effectively. LAB is essentially just resolving to RGB under the hood. It is also worth considering that the literal pixels on your screen are representations of RGB, which means that any other color space you work with is just going to get parsed back into RGB anyways when it actually displays. Really, it's best to just cut out the middleman and use that in almost every single case.
The problem with RGB in a programming sense, is that it is essentially cubic in representation, whereas HSL/HSV both resolve in a radius, which makes it much easier to create a "color wheel" programmatically. RGB is very difficult to do this with without writing huge piles of code to handle, because it resolves cubically in terms of its data representation. However, RGB accurately reflects human vision very well, and it's also the foundational basis of the actual hardware a monitor consists of.
TLDR; If you want dead on color and don't mind the extra work, use RGB all of the time. If you want to bang out a "good enough" color utility and probably field bug tickets later that you won't be able to really do anything about, use HSL/HSV. If you are doing print, use CMYK, not because it's good, but because the printer will choke if you don't use it, even though it otherwise sucks.
As an aside, if you were to approach Color Theory like an artist instead of a programmer, you are going to find a very different perception than any technical specifications about color really impart. Bear in mind that anyone working with a color utility you create is basically going to be thinking along these lines, at least if they have a solid foundational education in color theory. Here's basically how an artist approaches the notion of color:
Color from an artistic perspective is basically represented on a scale of five planes.
Pigment (or hue), which is the actual underlying color you are going after.
Tint, which is the pigment mixed with pure white.
Shade, which is the pigment mixed with pure black.
Tone (or "True Tone"), which is the pigment mixed with a varying degree of gray.
Rich Tone (or "Earth Tones"), which is the pigment mixed with its complementary color. Rich tones do not show up on the color wheel because they are inherently a mix of opposites, and visually reflect slightly differently than a "True Tone" due to minute discrepancies in physical media that you can't replicate effectively on a machine.
The typical problem with representing this paradigm programmatically is that there is not really any good way to represent rich tones. A material artist has basically no issue doing this with paint, because the subtle discrepancies of brush strokes allow the underlying variance between the complements to reflect in the composition. Likewise digital photography and video both suck at picking this up, but actual analog film does not suck nearly as bad at it. It is more reflected in photography and video than computer graphics because the texture of everything in the viewport of the camera picks up some of it, but is is still considerably less than actually viewing the same thing (which is why you can never take a really good picture of a sunset without a ton of post production to hack the literal look of it back in, for example). However, computers are not good at replicating those discrepancies, because a color is basically going to resolve to a consistent matrix of RGB pixel mapping which visually appears to be a flat regular tone. There is no computational color space that accurately reflects rich tones, because there is no computational way to make a color vary slightly in a diffuse, non-repeating random way over space and still have a single unique identifier, and you can't very well store it as data without a unique identifier.
The best approximation you can do of this with a computer is to create some kind of diffusion of one color overlapping another color, which does not resolve to a single value that you can represent as a hex code or stuff in a single database column. Even then, a computer is going to inherently reflect a uniform pattern, where a real rich tone relies on randomness and non-repeating texture and variance, which you can't do on a machine without considerable effort. All of the artwork that really makes color pop relies on this principle, and it is basically inaccessible to computational representation without a ton of side work to emulate it (which is why we have Photoshop and Corel Painter, because they can emulate this stuff pretty well with a bit of work, but at the cost of performing a lot of filtering that is not efficient for runtime).
RGB is a pretty good approximation of the other four characteristics from an artistic perspective. We pretty much get that it's not going to cover rich tones and that we're going to have to crack out a design utility and mash that part in by hand. However the underlying problem with programming in RGB is that it wants to resolve to a three dimensional space (because it is cubic), and you are trying to present it on a two dimensional display, which makes it very difficult to create UI that is reasonably intuitive because you lack the capacity to represent the depth of a 3rd axis on a computer monitor effectively in any way that is ever going to be intuitive to use for an end user.
You also need to consider the distinction between color represented as light, and color represented as pigment. RGB is a representation of color represented as light, and corresponds to the primary values used to mix lighting to represent color, and does so with a 1:1 mapping. CMYK represents the pigmentation spectrum. The distinction is that when you mix light in equal measure, you get white, and when you mix pigment in equal measure, you get black. If you are programming any utility that uses a computer, you are working with light, because pixels are inherently a single node on a monitor that emits RGB light waves. The reason I said that CMYK sucks, is not because it's not accurate, it's because it's not accurate when you try to represent it as light, which is the case on all computer monitors. If you are using actual paint, markers, colored pencils, etc, it works just fine. However representing CMYK on a screen still has to resolve to RGB, because that is how a computer monitor works, so it's always off a bit in terms of how it looks in display.
Not to go off on a gigantic side tangent, as this is a programming forum and you asked the question as a programmer. However if you are going for accuracy, there is a distinct "not technical" aspect to consider in terms of how effective your work will be at achieving its desired objective, which is to resolve well against visual perception, which is not particularly well represented in most computational color spaces. At the end of the day, the goal with any color utility is to make it look right in terms of human perception of color. HSL/HSV both fail miserably at that. They are prominent because they are easy to code with, and only for that reason. If you have a short deadline, they are acceptable answers. If you want something that is really going to work well, then you need to do the heavy legwork and consider this stuff, which is what your audience is considering when they decide if they want to use your tool or not.
Some reference points for you (I'm purposely avoiding any technical references, as they only refer to computational perspective, not the actual underlying perception of color, and you've probably read all of those already anyhow):
Color Theory Wiki
Basic breakdown of hue, tint, tone, and shade
Earth Tones (or rich tones if you prefer)
Basic fundamentals of color schemes
Actually, I'd have to argue that HSV accounts better for human visual perception as long as you understand that in HSV, saturation is the purity of the color and value is the intensity of that color, not brightness overall. Take this image, for example...
Here is a mapping of the HSL saturation (left) and HSL luminance (right)...
Note that the saturation is 100% until you hit the white at the very top where it drops suddenly. This mapping isn't perceived when looking at the original image. The same goes for the luminance mapping. While it's a clearer gradient, it only vaguely matches visually. Compare that to HSV saturation (left) and HSV value (right) below...
Here the saturation mapping can be seen dropping as the color becomes more white. Likewise, the value mapping can be very clearly seen in the original image. This is made more obvious when looking at the mappings for the individual color channels of the original image (the non-black areas almost perfectly match the value mapping, but are nowhere close to the luminance mapping)...Going by this information, I would have to say that HSV is better for working with actual images (especially photographs) whereas HSL is possibly better only for selecting colors in a color picker.
On a side note, the value in HSV is the inverse of the black in CMYK.
Another argument for the use of HSV over HSL is that HSV has much fewer combinations of different values that can result in the same color since HSL loses about half of its resolution to its top cone. Let's say you used bytes to represent the components--thereby giving each component 256 unique levels. The maximum number of unique RGB outputs this will yield in HSL is 4,372,984 colors (26% of the available RGB gamut). In HSV this goes up to 9,830,041 (59% of the RGB gamut)... over twice as many. And allowing a range of 0 to 359 for hue will yield 11,780,015 for HSV yet only 5,518,160 for HSL.

How to compensate for uneven illumination in a photograph of a printed page?

I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.

Resources