I found out that taking the Euclidean distance in RGB space to compare two colors in applications like image segmentation is not recommended because of its dependence on illumination and lighting conditions. Furthermore, because of the numerical instability of the HSV hue value at low intensity, the CIELAB color space is said to be a better alternative.
My problem is that I don't understand how to actually use it: Since CIELAB is device independent, you cannot simply convert to it from some RGB values without knowing anything about the sensor that was used to obtain these RGB values. As far as I know, you have to convert to CIEXYZ in an intermediate step first, but there are several different matrices available depending on the exact RGB working space of the source.
Or is it irrelevant which matrix you choose if you only want to use CIELAB to compare two colors (as I said, for example to perform image segmentation)?
If you don't know the exact color space that you're converting from, you may use sRGB - it was designed to be a generic space that corresponded to the average monitor of the time. It won't be exact of course, but it's likely to be acceptable. As you observe, perfect accuracy shouldn't be necessary for image segmentation, as the relative distances between colors won't be materially affected.
Related
I am trying to understand how to properly work with the RGB values found in PNM formats in order to inevitably convert them to Grayscale.
Researching the subject, it appears that if the RGB values are nonlinear, then I would need to first convert them to a linear RGB color space, apply my weights, and then convert them back to the same nonlinear color space.
There appears to be an expected format http://netpbm.sourceforge.net/doc/ppm.html:
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function.
So I take it these values are nonlinear, but not sRGB. I found some thread topics around ImageMagick that say they might save them as linear RGB values.
Am I correct that PNM specifies a standard, but various editors like Photoshop or GIMP may or may not follow it?
From http://netpbm.sourceforge.net/doc/pamrecolor.html
When you use this option, the input and output images are not true Netpbm images, because the Netpbm image format specifies a particular color space. Instead, you are using a variation on the format in which the sample values in the raster have different meaning. Many programs that ostensibly use Netpbm images actually use a variation with a different color space. For example, GIMP uses sRGB internally and if you have GIMP generate a Netpbm image file, it really generates a variation of the format that uses sRGB.
Else where I see this http://netpbm.sourceforge.net/doc/pgm.html:
Each gray value is a number proportional to the intensity of the
pixel, adjusted by the ITU-R Recommendation BT.709 gamma transfer
function. (That transfer function specifies a gamma number of 2.2 and
has a linear section for small intensities). A value of zero is
therefore black. A value of Maxval represents CIE D65 white and the
most intense value in the image and any other image to which the image
might be compared.
BT.709's range of channel values (16-240) is irrelevant to PGM.
Note that a common variation from the PGM format is to have the gray
value be "linear," i.e. as specified above except without the gamma
adjustment. pnmgamma takes such a PGM variant as input and produces a
true PGM as output.
Most sources out there assume they are dealing with linear RGB and just apply their weights and save, possibly not preserving the luminance. I assume that any complaint renderer will assume that these RGB values are gamma compressed... thus technically displaying different grayscale "colors" than what I had specified. Is this correct? Maybe to ask it differently, does it matter? I know it is a loaded question, but if I can't really tell if it is linear or nonlinear, or how it has been compressed or expected to be compressed, will the image processing algorithms (binarization) be greatly effected if I just assume linear RGB values?
There may have been some confusion with my question, so I would like to answer it now that I have researched the situation much further.
To make a long story short... it appears like no one really bothers to re-encode an image's gamma when saving to PNM format. Because of that, since almost everything is sRGB, it will stay sRGB as opposed to the technically correct BT.709, as per the spec.
I reached out to Bryan Henderson of NetPBM. He held the same belief and stated that the method of gamma compression is not as import as knowing if it was applied or not and that we should always assume it is applied when working with PNM color formats.
To reaffirm the effect of that opinion in regard to image processing, please read "Color-to-Grayscale: Does the Method Matter in Image Recognition?", 2012 by Kanan and Cottrell. Basically if you calculate the Mean of the RGB values you will end up in one of three situations: Gleam, Intensity', or Intensity. After comparing the effects of different grayscale conversion formulas, taking into account when and how gamma correction was applied, he discovered that Gleam and Intensity' where the best performers. They differ only by when the gamma correction was added (Gleam has the gamma correction on the input RGB values, while Intensity' takes in linear RGB and applies gamma afterwords). Sadly you drop from 1st and 2nd place down to 8th when no gamma correction is added, aka Intensity. It's interesting to note that it was the simple Mean formula that worked the best, not one of the more popular grayscale formulas most people tout. All of that to say that if you use the Mean formula for converting PNM color to grayscale for image processing applications, you will ensure great performance since we can assume some gamma compression will have been applied. My comment about ImageMagick and linear values appears only to apply to their PGM format.
I hope that helps!
There is only one way good way to convert colour signal to greyscale: going to linear space and add light (and so colour intensities). In this manner you have effective light, and so you can calculate the brightness. Then you can "gamma" correct the value. This is the way light behave (linear space), and how the brightness was measured by CIE (by wavelength).
On television it is standard to build luma and then black and white images) from non-linear R,G,B. This is done because simplicity and the way analog colour television (NTSC and PAL) worked: black and white signal (for BW television) as main signal, and then adding colours (as subcarrier) to BW image. For this reason, the calculations are done in non linear space.
Video could use often such factors (on non-linear space), because it is much quick to calculate, and you can do it easily with integers (there are special matrix to use with integers).
For edge detection algorithms, it should not be important which method you are using: we have difficulty to detect edge with similar L or Y', so we do no care if computers have similar problem.
Note: our eyes are non linear on detecting light intensities, and with similar gamma as phosphors on our old televisions. For this reason using gamma corrected value is useful: it compress the information in a optimal way (or in "analog-TV" past: it reduce perceived noise).
So you if you want Y', do with non linear R',G',B'. But if you need real grey scale, you need to calculate real greyscale going to linear space.
You may see differences especially on mid-greys, and on purple or yellow, where two of R,G,B are nearly the same (and as maximum value between the three).
But on photography programs, there are many different algorithms to convert RGB to greyscale: we do not see the world in greyscale, so different weight (possibly non linear) could help to make out some part of image, which it is the purpose of greyscale photos (by remove distracting colours).
Note Rec.709 never specified the gamma correction to apply (the OETF on the standard is not useful, we need EOTF, and often one is not the inverse of the other, for practical reasons). Only on a successive recommendation this missing information were finally provided. But because many people speak about Rec.709, the inverse of OETF is used as gamma, which it is incorrect.
How to detect: classical yellow sun on blue sky, choosing yellow and blue with same L. If you see sun in grey image, you are transforming with non-linear space (Y' is not equal). If you do no see the sun, you transform linearly.
I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.
I am confused about the need to change color space for color comparison. I have read about delta E, the Lab format, and I do understand that comparisons in the RGB color space will not seem appropriate to the human eye. However, my program uses a linear color scale to calculate velocity, from a color flow Doppler signal. It takes the mean color of a sample region and compares it to the colors of the scale to find its nearest neighbor using Euclidian distance. I do that entirely in the BGR (OpenCV) color space, as the example image below:
Here, I obtain seemingly correct velocity values for each color circle, but is it only by chance, or is my assumption correct that since the color comparisons take place internally, it does not matter what color space I am in?
Since you searchind for nearest neighbour, and operate with 3D points (in color space) it does not matter what color space you choose, they will only be displayed in different ways.
Comparison of colour is not straight forward. You need to decide what defines a colour being close to another and then pick the most appropriate colour space to support that.
For example, working in HSL will give you an easy way to assess colours based upon the hue. This is fine if you are happy to disregard, or at least reduce the relevance of saturation and luminance.
If on the other hand, you want a point change in saturation to be a relevant as a point change in hue, working in RGB or perhaps CMYK would be more appropriate. Measuring the distance by plotting the channels as three axis and then creating a distance between the two colours. This has the downside that a 10 point shift in saturation has the same measured difference as a 10 point shift in hue, which visually will not make that much sense as the perceived difference will not be equivalent to the mathematical.
And that brings in another consideration. The human eye is more sensitive to colour variance around different colours. Green for example, takes more variation to be noticeable than magentas. All down to evolution but may have a bearing in your representation.
Personally I tend to work with RGB as it is needed for visual display, but most commonly I will arrange colours by hue so keep a conversion handy to HSL/ HSB.
Is there a better way to converting (RGB) image to grayscale than
This way produces light intensity, which may not mark objects well for further processing. For example, if we have some hotspot or reflection, this will be depicted as noticeable object in such a grayscale.
I am experimenting with other color spaces like Lab, but hey have poor contrast.
It's not that simple as asking for a recipe - you need to define what you need.
The transform you used harks back to the early days of color TV, when there was a need for a way to separately encode the luminance and chrominance in the analog broadcast signal, taking into account the fact that a lot less bandwidth was available to transmit chroma than luma. The encoding is very loosely related to the higher relative sensitivity of the cones in the human retina in the yellow-green band.
There is no reason to use it blindly. Rather, you need to clearly express what the goal of your desired transformation is, translate that goal into a (quantifiable) criterion, then find a particular transform that optimizes that criterion. The transform can be global (i.e. like the TV one you used) or adaptive (i.e. depending on the color values in a neighborhood of the current pixel), and either way it can be linear (like, again, the TV one) or not.
Since people actually can identify the terms "shadow" and "reflection", it stands that this is a decently high level operation. In addition, a person can be "blinded" or confused due to these effects. So I will go with "No, there is no significantly better, low-level way to eliminate different luminance effects".
You can make a module that detects adjacent lightness-distorted regions (based on cues like hue and chroma, spatial factors of whether they form a "jigsaw puzzle", etc), and stitch them together.
I recommend HSV because it has worked well for me for quite reliably overcoming shadows in images.
A trick you can use with Lab is to just ignore the L channel, then the other two channels just give variation in color. This can be very effective if you want to find the boundaries of an object that has a bright light shining on it.
There are many other color spaces that separate brightness from color information, like Lab. Some examples are HSV, YUV, YCrCb. Just pick whichever of these works best, discard the brightness and work with two channels of color.
Lab is a 'perceptual" color space that attempts to match non-linearities in the eye. That is Lab numbers that are close together will be perceived as very similar by a human, while Lab numbers that differ greatly will be perceived as very different. RGB does not work nicely like that.
Some notes about the conversion you mentioned:
If use CV_RGB2GRAY conversion in OpenCV, it uses the coefficeints that you mentioned. However whether these are the correct numbers to use depends on the flavor of RGB you have.
Your numbers are for BT.601 primaries as used in analogue TV such as NTSC and PAL.
Newer HDTV, and sRGB which is widely used in computer monitors and printers uses BT. 709 primaries, in which case the conversion should be Y = 0.2126 R + 0.7152 G + 0.0722 B, and Y here is as defined by CIE 1931. The L channel in Lab also corresponds to the CIE 1931 luminance value.
Then there is Adobe RGB, which can represent more colors than sRGB (it has a wider "gamut"). But I don't think OpenCV has a conversion for it.
The best way to convert RGB to grayscale depends on where your image comes from and what you want to do with it.
It would be worth looking at the OpenCV cvtColor() documentation.
There are a few works in this field. For example this one: http://dl.acm.org/citation.cfm?id=2407754
I'm looking for a way to get a complete list of all the RGB values for each pixel in a given image using OpenCV, now i call this "color quantization".
The problem is that according to what I have found online, at least at this point, this "color quantization" thing is about histograms or "color reduction" or similar discrete computation solutions.
Since I know what I want and the "internet" seems to have a different opinion about what this words mean, I was wondering: maybe there is not a real solution for this ? a workable way or a working algorithm in the OpenCV lib.
Generally speaking, quantization is an operation that takes an input signal with real (mathematical) values to a set of discrete values. A possible algorithm to implement this process is to compute the histogram of the data, then retaining the n values that correspond to the n bins of the histogram with the higher population.
What you are trying to do would be called maybe color listing.
If you ar eworking with 8 bits quantized images (type CV_8UC3), my guess is that you do what you desire by taking the histogram of the input image (bin width equal to 1) then searching the result for non-empty bins.
Color quantization is the conversion of infinite natural colors in the finite digital color space. Anyway to create a full color 'histogram' you can use opencv's sparse matrix implementation and write your own function to compute it. Of course you have to access the pixels one by one, if you have no other structural or continuity information about the image.