Is HSL Superior over HSI and HSV Color Spaces? - image-processing

Is HSL superior over HSI and HSV, because it takes human perception into account.?
For some image processing algorithms they say I can use either of these color spaces,
and I am not sure which one to pick. I mean, the algorithms just care that you provide
them with hue and saturation channel, you can pick which color space to use

Which one is best very much depends on what you're using it for. But in my experience HSL (HLS) has an unfortunate interaction between brightness and saturation.
Here's an example of reducing image brightness by 2. The leftmost image is the original; next comes the results using RGB, HLS, and HSV:
Notice the overly bright and saturated spots around the edge of the butterfly in HLS, particularly that red spot at the bottom. This is the saturation problem I was referring to.
This example was created in Python using the colorsys module for the conversions.

Since there is no accepted answer yet, and since I had to further research to fully understand this, I'll add my two cents.
Like others have said the answer as to which of HSL or HSV is better depends on what you're trying to model and manipulate.
tl;dr - HSV is only "better" than HSL for machine vision (with caveats, read below). "Lab" and other formal color models are far more accurate (but computationally expensive) and should really be used for more serious work. HSL is outright better for "paint" applications or any other where you need a human to "set", "enter" or otherwise understand/make sense of a color value.
For details, read below:
If you're trying to model how colours are GENERATED, the most intuitive model is HSL since it maps almost directly to how you'd mix paints to create colors. For example, to create "dark" yellow, you'd mix your base yellow paint with a bit of black. Whereas to create a lighter shade of yellow, you'd mix a bit of white.
Values between 50 and 0 in the "L" spectrum in HSL map to how much "black" has to be mixed in (black increasing from 0 to 100%, as L DECREASES from 50 to 0).
Values between 50 and 100 map to how much "white" has to be mixed in (white varying from 0 to 100% as L increases from 50 to 100%).
50% "L" gives you the "purest" form of the color without any "contamination" from white or black.
Insights from the below links:
1. http://forums.getpaint.net/index.php?/topic/22745-hsl-instead-of-hsv/
The last post there.
2. http://en.wikipedia.org/wiki/HSL_and_HSV
Inspect the color-space cylinder for HSL - it gives a very clear idea of the kind of distribution I've talked about.
Plus, if you've dealt with paints at any point, the above explanation will (hopefully) make sense. :)
Thus HSL is a very intuitive way of understanding how to "generate" a color - thus it's a great model for paint applications, or any other applications that are targeted to an audience used to thinking in "shade"/"tone" terms for color.
Now, onto HSV.
This is treacherous territory now as we get into a space based on a theory I HAVE FORMULATED to understand HSV and is not validated or corroborated by other sources.
In my view, the "V" in HSV maps to the quantity of light thrown at an object, with the assumption, that with zero light, the object would be completely dark, and with 100% light, it would be all white.
Thus, in this image of an apple, the point that is directly facing the light source is all white, and most likely has a "V" at 100% whereas the point at the bottom that is completely in shadow and untouched by light, has a value "0". (I haven't checked these values, just thought they'd be useful for explanation).
Thus HSV seems to model how objects are lit (and therefore account for any compensation you might have to perform for specular highlights or shadows in a machine vision application) BETTER than HSL.
But as you can see quite plainly from the examples in the "disadvantages" section in the Wikipedia article I linked to, neither of these methods are perfect. "Lab" and other more formal (and computationally expensive) color models do a far better job.
P.S: Hope this helps someone.

The only color space that has advantage and takes human perception into account is LAB, in the sense that the Euclidian metric in it is correlated with human color differentiation.
Taken directly from Wikipedia:
Unlike the RGB and CMYK color models, Lab color is designed to
approximate human vision. It aspires to perceptual uniformity, and its
L component closely matches human perception of lightness
That is the reason that many computer vision algorithms are taking advantage of LAB space
HSV, HSB and HSI don't have this property. So the answer is no, HSL is not "superior" over HSI and HSV in the sense of human perception.
If you want to be close to human perception, try LAB color space.

I would say that one is NO better than another, each is just a mathematical conversion of another. Differing representations CAN make manipulation of an image for the effect you wish a bit easier. Each person WILL perceive images a bit differently, and using HSI or HSV may provide a small difference in output image.
Even RGB when considered against a system (i.e. with pixel array) takes into account human perception. When an imager (with a bayer overlay) takes a picture, there are 2 green pixels for every 1 red and blue pixel. Monitors still output in RGB (although most only have a single green pixel for each red and blue). A new TV monitor made by Sharp now has a yellow output pixel. The reason they have done this is due to there being a yellow band in the actual frequency spectrum, so to better truly represent color, they have added a yellow band (or pixel).
All of these things are based on the human eye having a greater sensitivity to green over any other color in the spectrum.
Regardless, whatever scale you use, the image will be transformed back to RGB to be displayed on screen.
http://hyperphysics.phy-astr.gsu.edu/hbase/vision/colcon.html
http://www.physicsclassroom.com/class/light/u12l2b.cfm
In short, I dont think any one is better than another, just different representations.
http://en.wikipedia.org/wiki/Color

Imma throw my two cents in here being both a programmer and also a guy who aced Color Theory in art school before moving on to software engineering career wise.
HSL/HSV are great for easily writing programmatic functionality to handle color without dealing with a ton of edge cases. They are terrible at replicating human perception of color accurately.
CMYK is great for rendering print stuff, because it approximates the pigments that printers rely on. It is also terrible at replicating human perception of color accurately (although not because it's bad per se, but more because computers are really bad at displaying it on a screen. More on that in a minute).
RGB is the only color utility represented in tech that accurately reflects human vision effectively. LAB is essentially just resolving to RGB under the hood. It is also worth considering that the literal pixels on your screen are representations of RGB, which means that any other color space you work with is just going to get parsed back into RGB anyways when it actually displays. Really, it's best to just cut out the middleman and use that in almost every single case.
The problem with RGB in a programming sense, is that it is essentially cubic in representation, whereas HSL/HSV both resolve in a radius, which makes it much easier to create a "color wheel" programmatically. RGB is very difficult to do this with without writing huge piles of code to handle, because it resolves cubically in terms of its data representation. However, RGB accurately reflects human vision very well, and it's also the foundational basis of the actual hardware a monitor consists of.
TLDR; If you want dead on color and don't mind the extra work, use RGB all of the time. If you want to bang out a "good enough" color utility and probably field bug tickets later that you won't be able to really do anything about, use HSL/HSV. If you are doing print, use CMYK, not because it's good, but because the printer will choke if you don't use it, even though it otherwise sucks.
As an aside, if you were to approach Color Theory like an artist instead of a programmer, you are going to find a very different perception than any technical specifications about color really impart. Bear in mind that anyone working with a color utility you create is basically going to be thinking along these lines, at least if they have a solid foundational education in color theory. Here's basically how an artist approaches the notion of color:
Color from an artistic perspective is basically represented on a scale of five planes.
Pigment (or hue), which is the actual underlying color you are going after.
Tint, which is the pigment mixed with pure white.
Shade, which is the pigment mixed with pure black.
Tone (or "True Tone"), which is the pigment mixed with a varying degree of gray.
Rich Tone (or "Earth Tones"), which is the pigment mixed with its complementary color. Rich tones do not show up on the color wheel because they are inherently a mix of opposites, and visually reflect slightly differently than a "True Tone" due to minute discrepancies in physical media that you can't replicate effectively on a machine.
The typical problem with representing this paradigm programmatically is that there is not really any good way to represent rich tones. A material artist has basically no issue doing this with paint, because the subtle discrepancies of brush strokes allow the underlying variance between the complements to reflect in the composition. Likewise digital photography and video both suck at picking this up, but actual analog film does not suck nearly as bad at it. It is more reflected in photography and video than computer graphics because the texture of everything in the viewport of the camera picks up some of it, but is is still considerably less than actually viewing the same thing (which is why you can never take a really good picture of a sunset without a ton of post production to hack the literal look of it back in, for example). However, computers are not good at replicating those discrepancies, because a color is basically going to resolve to a consistent matrix of RGB pixel mapping which visually appears to be a flat regular tone. There is no computational color space that accurately reflects rich tones, because there is no computational way to make a color vary slightly in a diffuse, non-repeating random way over space and still have a single unique identifier, and you can't very well store it as data without a unique identifier.
The best approximation you can do of this with a computer is to create some kind of diffusion of one color overlapping another color, which does not resolve to a single value that you can represent as a hex code or stuff in a single database column. Even then, a computer is going to inherently reflect a uniform pattern, where a real rich tone relies on randomness and non-repeating texture and variance, which you can't do on a machine without considerable effort. All of the artwork that really makes color pop relies on this principle, and it is basically inaccessible to computational representation without a ton of side work to emulate it (which is why we have Photoshop and Corel Painter, because they can emulate this stuff pretty well with a bit of work, but at the cost of performing a lot of filtering that is not efficient for runtime).
RGB is a pretty good approximation of the other four characteristics from an artistic perspective. We pretty much get that it's not going to cover rich tones and that we're going to have to crack out a design utility and mash that part in by hand. However the underlying problem with programming in RGB is that it wants to resolve to a three dimensional space (because it is cubic), and you are trying to present it on a two dimensional display, which makes it very difficult to create UI that is reasonably intuitive because you lack the capacity to represent the depth of a 3rd axis on a computer monitor effectively in any way that is ever going to be intuitive to use for an end user.
You also need to consider the distinction between color represented as light, and color represented as pigment. RGB is a representation of color represented as light, and corresponds to the primary values used to mix lighting to represent color, and does so with a 1:1 mapping. CMYK represents the pigmentation spectrum. The distinction is that when you mix light in equal measure, you get white, and when you mix pigment in equal measure, you get black. If you are programming any utility that uses a computer, you are working with light, because pixels are inherently a single node on a monitor that emits RGB light waves. The reason I said that CMYK sucks, is not because it's not accurate, it's because it's not accurate when you try to represent it as light, which is the case on all computer monitors. If you are using actual paint, markers, colored pencils, etc, it works just fine. However representing CMYK on a screen still has to resolve to RGB, because that is how a computer monitor works, so it's always off a bit in terms of how it looks in display.
Not to go off on a gigantic side tangent, as this is a programming forum and you asked the question as a programmer. However if you are going for accuracy, there is a distinct "not technical" aspect to consider in terms of how effective your work will be at achieving its desired objective, which is to resolve well against visual perception, which is not particularly well represented in most computational color spaces. At the end of the day, the goal with any color utility is to make it look right in terms of human perception of color. HSL/HSV both fail miserably at that. They are prominent because they are easy to code with, and only for that reason. If you have a short deadline, they are acceptable answers. If you want something that is really going to work well, then you need to do the heavy legwork and consider this stuff, which is what your audience is considering when they decide if they want to use your tool or not.
Some reference points for you (I'm purposely avoiding any technical references, as they only refer to computational perspective, not the actual underlying perception of color, and you've probably read all of those already anyhow):
Color Theory Wiki
Basic breakdown of hue, tint, tone, and shade
Earth Tones (or rich tones if you prefer)
Basic fundamentals of color schemes

Actually, I'd have to argue that HSV accounts better for human visual perception as long as you understand that in HSV, saturation is the purity of the color and value is the intensity of that color, not brightness overall. Take this image, for example...
Here is a mapping of the HSL saturation (left) and HSL luminance (right)...
Note that the saturation is 100% until you hit the white at the very top where it drops suddenly. This mapping isn't perceived when looking at the original image. The same goes for the luminance mapping. While it's a clearer gradient, it only vaguely matches visually. Compare that to HSV saturation (left) and HSV value (right) below...
Here the saturation mapping can be seen dropping as the color becomes more white. Likewise, the value mapping can be very clearly seen in the original image. This is made more obvious when looking at the mappings for the individual color channels of the original image (the non-black areas almost perfectly match the value mapping, but are nowhere close to the luminance mapping)...Going by this information, I would have to say that HSV is better for working with actual images (especially photographs) whereas HSL is possibly better only for selecting colors in a color picker.
On a side note, the value in HSV is the inverse of the black in CMYK.
Another argument for the use of HSV over HSL is that HSV has much fewer combinations of different values that can result in the same color since HSL loses about half of its resolution to its top cone. Let's say you used bytes to represent the components--thereby giving each component 256 unique levels. The maximum number of unique RGB outputs this will yield in HSL is 4,372,984 colors (26% of the available RGB gamut). In HSV this goes up to 9,830,041 (59% of the RGB gamut)... over twice as many. And allowing a range of 0 to 359 for hue will yield 11,780,015 for HSV yet only 5,518,160 for HSL.

Related

Role of color profile in graphics program

I think I understand what color profiles are. I do not understand, what is the difference in manipulating photo for example in photoshop in 16bpp sRGB and 16bpp Adobe RGB. My monitor can only show me sRGB.
Is there any difference in algorithms?
Maybe there is some preprocessing executed before program displays effects of my work (for example AdobeRGB(0.3, 0.25, 0.82) is being displayed as sRGB(0.301, 0.253, 0.819) in my monitor)?
Is there any sense in using different color profiles when I am not using ICC profile of my monitor/printer?
In general – what should I do if I would want to develop my own graphics-manipulating application that supports profile different than sRGB (for example in Qt)?
The color space your image uses determines how your 16 bits per pixel should relate to the output produced by your monitor, i.e., it determines what colors the numbers actually represent.
This can make a difference in the way some algorithms are processed if they are supposed to make realistic, natural-looking, or consistent results.
Let's say you composite a semi-transparent yellow on top of a dark red background? What kind of brown do you get? If the algorithm always mixes the pixel data the same way, then even when the yellow and red look the same on your monitor, the brown you get might be different because of your color space.
A more 'correct' way to do mixing would be to transform your pixel data into a consistent color space, mix, and then transform back. If the original colors look the same on two monitors with different calibrated profiles, then they will transform into the same numbers in a consistent color space, and the mix result will transform back into results that look the same on both monitors even though the pixel values might be different.
Natural-looking compositing with semi-transparency is a good example of an algorithm that has to take your color space into account in order to produce realistic results. Other effects that have to look 'natural', like specular highlights, shadows, etc., similarly need to do physically accurate math in a consistent color space.
To answer your specific questions:
Yes, as explained, many algorithms should perform different calculations with different color spaces.
Yes, there is. The image's color space defines what the data means in terms of physical light. If you display it with an ICC calibrated profile, it is transformed into the numbers that your monitor needs to accurately display your image.
It should make very little difference what color space you use for your image, except that some display software won't take it into account. Making sRGB images is better for cross-system compatibility, but I think Adobe RBG has a bigger gamut and can actually represent some green colors that sRGB can't. You should use printer and monitor calibration so that you can SEE what your image really looks like.
I think I answered that above.
Their's no differences in algorithms because you operate in RGB color space and not in XYZ color space. Monitors like you said shows colors differently, the red on one monitor may not exactly match the red primary on another monitor. In order to define different RGB color spaces in a common manner, monitors use the CIE 1931 XYZ color space. Every monitor or system calculate RGB color to XYZ according to used profiles, for example: RGB (1,0,0) = XYZ (0.4358, 0.2224, 0.0139) in sRGB and XYZ (0.7977, 0.2880, 0.0000) in ProPhotoRGB.
For further information see:
http://ninedegreesbelow.com/photography/xyz-rgb.html
http://www.ryanjuckett.com/programming/rgb-color-space-conversion/
Gamut mapping explained by analogy
If you change color spaces, you may lose some of the information because the mapping from one to the other may not be injective (invertible). You may choose among different rendering intents to pick the mapping that throws only the information you find least useful away.
This analogy might illustrate the consequences of converting an image to a smaller color space when the original space is larger than the one of your device: You can very well represent a 3D object in the computer, but you will never actually see it, because your screen is flat and thus able to display only 2D images. You can view projections of the object, you can view cuts through the object, but you need a 3D printer to get something really 3D out of it.
Even if you have no 3D printer, it is worth representing the object in 3D and not as a fixed 2D projection. Otherwise, you would not be able to make all those 2D cuts and projections, and even if you bought a 3D printer in the future, you could not print the object anymore.
The 3D object is a picture in the larger space, a fixed 2D projection is a picture in a smaller space, screen is a device with the smaller color space and 3D printer is a device with the larger color space. End of analogy.
ICC workflow
If you take a photo, you camera should assign a profile to it, describing the device color space of the camera. The profile defines mapping of the numbers inside the picture (coordinates in device color space) to real-world colors (coordinates in an absolute color space). Therefore, without a profile, the numbers really have no meaning and anyone is free to make up any mapping they like.
If you shoot RAW, you do the color space conversion when developing the photo; if you shoot JPEG, the camera performs this task for you.
In the opposite direction, when displaying or printing: If the display device is not calibrated and has no profile, the real-world colors stored in the image might not match what comes out of the device in reality. The mapping between the image color space and the output device space could not guarantee that the colors will be preserved and is somewhat arbitrary.
Actual answers
The difference in manipulating the photo in sRGB and Adobe RGB is that Adobe RGB is larger and thus preserves more information for further processing.
The difference in algorithms has already been explained by Matt Timmermans in another answer. Regarding color blending, you might want to know more about perceptually uniform color spaces (see e.g. a closed Q & A on SO).
Yes, conversion from Adobe RGB to sRGB is not identity and thus requires some processing. Where exactly this processing is done (device driver, OS kernel, image processing software) depends on the source and target, the OS and their settings. If you convert the spaces in Photoshop, it does the computation itself. Windows have a built-in color management module that takes care of converting an image with profile to the device color space of the output device.
The image you want to display/print might be stored in some rather exotic color space. If the OS guesses it is in sRGB (Windows would), it might give odd results. It is better to provide as much information as possible to the color management system. Even uncalibrated devices might be assigned some generic profiles, some guesswork might take place. And maybe, you’ll calibrate and characterize your device someday, or you’ll send the image to someone with such a device.
Qt itself does not support color management. However, KDE, which is built atop Qt, supports some color management via Oyranos.
When should we expect complete color management for KDE?
If we are talking about color management in Qt, not anytime soon. If we are talking about decent color management implemented in the compositor (KWin), sooner than not anytime soon. It also depends on how quickly the graphics applications adapt to these new color management things.
You could use Oyranos or another color management system directly in your application. Google told me about a thesis about getting color management to Qt, too.
Related reading
Generalities about colors # color-management-guide.com
ICC FAQ
Windows 7: Change color management settings
Windows Vista: Color management settings FAQ
Introduction to Color Management in Microsoft Windows Operating Systems
Windows Color System # MSDN

Is a change in colorspace needed when comparing colors only evaluated by the computer?

I am confused about the need to change color space for color comparison. I have read about delta E, the Lab format, and I do understand that comparisons in the RGB color space will not seem appropriate to the human eye. However, my program uses a linear color scale to calculate velocity, from a color flow Doppler signal. It takes the mean color of a sample region and compares it to the colors of the scale to find its nearest neighbor using Euclidian distance. I do that entirely in the BGR (OpenCV) color space, as the example image below:
Here, I obtain seemingly correct velocity values for each color circle, but is it only by chance, or is my assumption correct that since the color comparisons take place internally, it does not matter what color space I am in?
Since you searchind for nearest neighbour, and operate with 3D points (in color space) it does not matter what color space you choose, they will only be displayed in different ways.
Comparison of colour is not straight forward. You need to decide what defines a colour being close to another and then pick the most appropriate colour space to support that.
For example, working in HSL will give you an easy way to assess colours based upon the hue. This is fine if you are happy to disregard, or at least reduce the relevance of saturation and luminance.
If on the other hand, you want a point change in saturation to be a relevant as a point change in hue, working in RGB or perhaps CMYK would be more appropriate. Measuring the distance by plotting the channels as three axis and then creating a distance between the two colours. This has the downside that a 10 point shift in saturation has the same measured difference as a 10 point shift in hue, which visually will not make that much sense as the perceived difference will not be equivalent to the mathematical.
And that brings in another consideration. The human eye is more sensitive to colour variance around different colours. Green for example, takes more variation to be noticeable than magentas. All down to evolution but may have a bearing in your representation.
Personally I tend to work with RGB as it is needed for visual display, but most commonly I will arrange colours by hue so keep a conversion handy to HSL/ HSB.

Better converting to gray?

Is there a better way to converting (RGB) image to grayscale than
This way produces light intensity, which may not mark objects well for further processing. For example, if we have some hotspot or reflection, this will be depicted as noticeable object in such a grayscale.
I am experimenting with other color spaces like Lab, but hey have poor contrast.
It's not that simple as asking for a recipe - you need to define what you need.
The transform you used harks back to the early days of color TV, when there was a need for a way to separately encode the luminance and chrominance in the analog broadcast signal, taking into account the fact that a lot less bandwidth was available to transmit chroma than luma. The encoding is very loosely related to the higher relative sensitivity of the cones in the human retina in the yellow-green band.
There is no reason to use it blindly. Rather, you need to clearly express what the goal of your desired transformation is, translate that goal into a (quantifiable) criterion, then find a particular transform that optimizes that criterion. The transform can be global (i.e. like the TV one you used) or adaptive (i.e. depending on the color values in a neighborhood of the current pixel), and either way it can be linear (like, again, the TV one) or not.
Since people actually can identify the terms "shadow" and "reflection", it stands that this is a decently high level operation. In addition, a person can be "blinded" or confused due to these effects. So I will go with "No, there is no significantly better, low-level way to eliminate different luminance effects".
You can make a module that detects adjacent lightness-distorted regions (based on cues like hue and chroma, spatial factors of whether they form a "jigsaw puzzle", etc), and stitch them together.
I recommend HSV because it has worked well for me for quite reliably overcoming shadows in images.
A trick you can use with Lab is to just ignore the L channel, then the other two channels just give variation in color. This can be very effective if you want to find the boundaries of an object that has a bright light shining on it.
There are many other color spaces that separate brightness from color information, like Lab. Some examples are HSV, YUV, YCrCb. Just pick whichever of these works best, discard the brightness and work with two channels of color.
Lab is a 'perceptual" color space that attempts to match non-linearities in the eye. That is Lab numbers that are close together will be perceived as very similar by a human, while Lab numbers that differ greatly will be perceived as very different. RGB does not work nicely like that.
Some notes about the conversion you mentioned:
If use CV_RGB2GRAY conversion in OpenCV, it uses the coefficeints that you mentioned. However whether these are the correct numbers to use depends on the flavor of RGB you have.
Your numbers are for BT.601 primaries as used in analogue TV such as NTSC and PAL.
Newer HDTV, and sRGB which is widely used in computer monitors and printers uses BT. 709 primaries, in which case the conversion should be Y = 0.2126 R + 0.7152 G + 0.0722 B, and Y here is as defined by CIE 1931. The L channel in Lab also corresponds to the CIE 1931 luminance value.
Then there is Adobe RGB, which can represent more colors than sRGB (it has a wider "gamut"). But I don't think OpenCV has a conversion for it.
The best way to convert RGB to grayscale depends on where your image comes from and what you want to do with it.
It would be worth looking at the OpenCV cvtColor() documentation.
There are a few works in this field. For example this one: http://dl.acm.org/citation.cfm?id=2407754

why we should use gray scale for image processing

I think this can be a stupid question but after read a lot and search a lot about image processing every example I see about image processing uses gray scale to work
I understood that gray scale images use just one channel of color, that normally is necessary just 8 bit to be represented, etc... but, why use gray scale when we have a color image? What are the advantages of a gray scale? I could imagine that is because we have less bits to treat but even today with faster computers this is necessary?
I am not sure if I was clear about my doubt, I hope someone can answer me
thank you very much
As explained by John Zhang:
luminance is by far more important in distinguishing visual features
John also gives an excellent suggestion to illustrate this property: take a given image and separate the luminance plane from the chrominance planes.
To do so you can use ImageMagick separate operator that extracts the current contents of each channel as a gray-scale image:
convert myimage.gif -colorspace YCbCr -separate sep_YCbCr_%d.gif
Here's what it gives on a sample image (top-left: original color image, top-right: luminance plane, bottom row: chrominance planes):
To elaborate a bit on deltheil's answer:
Signal to noise. For many applications of image processing, color information doesn't help us identify important edges or other features. There are exceptions. If there is an edge (a step change in pixel value) in hue that is hard to detect in a grayscale image, or if we need to identify objects of known hue (orange fruit in front of green leaves), then color information could be useful. If we don't need color, then we can consider it noise. At first it's a bit counterintuitive to "think" in grayscale, but you get used to it.
Complexity of the code. If you want to find edges based on luminance AND chrominance, you've got more work ahead of you. That additional work (and additional debugging, additional pain in supporting the software, etc.) is hard to justify if the additional color information isn't helpful for applications of interest.
For learning image processing, it's better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full color imaging and missing all the important insights that can (and should) be learned from single channel processing.
Difficulty of visualization. In grayscale images, the watershed algorithm is fairly easy to conceptualize because we can think of the two spatial dimensions and one brightness dimension as a 3D image with hills, valleys, catchment basins, ridges, etc. "Peak brightness" is just a mountain peak in our 3D visualization of the grayscale image. There are a number of algorithms for which an intuitive "physical" interpretation helps us think through a problem. In RGB, HSI, Lab, and other color spaces this sort of visualization is much harder since there are additional dimensions that the standard human brain can't visualize easily. Sure, we can think of "peak redness," but what does that mountain peak look like in an (x,y,h,s,i) space? Ouch. One workaround is to think of each color variable as an intensity image, but that leads us right back to grayscale image processing.
Color is complex. Humans perceive color and identify color with deceptive ease. If you get into the business of attempting to distinguish colors from one another, then you'll either want to (a) follow tradition and control the lighting, camera color calibration, and other factors to ensure the best results, or (b) settle down for a career-long journey into a topic that gets deeper the more you look at it, or (c) wish you could be back working with grayscale because at least then the problems seem solvable.
Speed. With modern computers, and with parallel programming, it's possible to perform simple pixel-by-pixel processing of a megapixel image in milliseconds. Facial recognition, OCR, content-aware resizing, mean shift segmentation, and other tasks can take much longer than that. Whatever processing time is required to manipulate the image or squeeze some useful data from it, most customers/users want it to go faster. If we make the hand-wavy assumption that processing a three-channel color image takes three times as long as processing a grayscale image--or maybe four times as long, since we may create a separate luminance channel--then that's not a big deal if we're processing video images on the fly and each frame can be processed in less than 1/30th or 1/25th of a second. But if we're analyzing thousands of images from a database, it's great if we can save ourselves processing time by resizing images, analyzing only portions of images, and/or eliminating color channels we don't need. Cutting processing time by a factor of three to four can mean the difference between running an 8-hour overnight test that ends before you get back to work, and having your computer's processors pegged for 24 hours straight.
Of all these, I'll emphasize the first two: make the image simpler, and reduce the amount of code you have to write.
I disagree with the implication that gray scale images are always better than color images; it depends on the technique and the overall goal of the processing. For example, if you wanted to count the bananas in an image of a fruit bowl image, then it's much easier to segment when you have a colored image!
Many images have to be in grayscale because of the measuring device used to obtain them. Think of an electron microscope. It's measuring the strength of an electron beam at various space points. An AFM is measuring the amount of resonance vibrations at various points topologically on a sample. In both cases, these tools are returning a singular value- an intensity, so they implicitly are creating a gray-scale image.
For image processing techniques based on brightness, they often can be applied sufficiently to the overall brightness (grayscale); however, there are many many instances where having a colored image is an advantage.
Binary might be too simple and it could not represent the picture character.
Color might be too much and affect the processing speed.
Thus, grayscale is chosen, which is in the mid of the two ends.
First of starting image processing whether on gray scale or color images, it is better to focus on the applications which we are applying. Unless and otherwise, if we choose one of them randomly, it will create accuracy problem in our result. For example, if I want to process image of waste bin, I prefer to choose gray scale rather than color. Because in the bin image I want only to detect the shape of bin image using optimized edge detection. I could not bother about the color of image but I want to see rectangular shape of the bin image correctly.

How to compensate for uneven illumination in a photograph of a printed page?

I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.

Resources