Converting RGB to grayscale/intensity - image-processing

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.
It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.
However, I didn't find a good reference for this on the web. What is the source of these values?
See also these previous questions: here and here.

The specific numbers in the question are from CCIR 601 (see Wikipedia article).
If you convert RGB -> grayscale with slightly different numbers / different methods,
you won't see much difference at all on a normal computer screen
under normal lighting conditions -- try it.
Here are some more links on color in general:
Wikipedia Luma
Bruce Lindbloom 's outstanding web site
chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2;
this long link to Ware in books.google.com
may or may not work
cambridgeincolor :
excellent, well-written
"tutorials on how to acquire, interpret and process digital photographs
using a visually-oriented approach that emphasizes concept over procedure"
Should you run into "linear" vs "nonlinear" RGB,
here's part of an old note to myself on this.
Repeat, in practice you won't see much difference.
### RGB -> ^gamma -> Y -> L*
In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ),
are called "nonlinear" or
Gamma corrected.
"Linear" values are defined as
Rlin = R^gamma, Glin = G^gamma, Blin = B^gamma
where gamma is 2.2 for many PCs.
The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma))
(purists tongue-click) but here I'll drop the '.
Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma,
so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness.
(LCD displays are more complex;
furthermore, some graphics cards compensate for gamma.)
To get the measure of lightness called L* from RGB,
first divide R G B by 255, and compute
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
This is Y in XYZ color space; it is a measure of color "luminance".
(The real formulas are not exactly x^gamma, but close;
stick with x^gamma for a first pass.)
Finally,
L* = 116 * Y ^ 1/3 - 16
"... aspires to perceptual uniformity [and] closely matches human perception of lightness." --
Wikipedia Lab color space

I found this publication referenced in an answer to a previous similar question. It is very helpful, and the page has several sample images:
Perceptual Evaluation of Color-to-Grayscale Image Conversions by Martin Čadík, Computer Graphics Forum, Vol 27, 2008
The publication explores several other methods to generate grayscale images with different outcomes:
CIE Y
Color2Gray
Decolorize
Smith08
Rasche05
Bala04
Neumann07
Interestingly, it concludes that there is no universally best conversion method, as each performed better or worse than others depending on input.

Heres some code in c to convert rgb to grayscale.
The real weighting used for rgb to grayscale conversion is 0.3R+0.6G+0.11B.
these weights arent absolutely critical so you can play with them.
I have made them 0.25R+ 0.5G+0.25B. It produces a slightly darker image.
NOTE: The following code assumes xRGB 32bit pixel format
unsigned int *pntrBWImage=(unsigned int*)..data pointer..; //assumes 4*width*height bytes with 32 bits i.e. 4 bytes per pixel
unsigned int fourBytes;
unsigned char r,g,b;
for (int index=0;index<width*height;index++)
{
fourBytes=pntrBWImage[index];//caches 4 bytes at a time
r=(fourBytes>>16);
g=(fourBytes>>8);
b=fourBytes;
I_Out[index] = (r >>2)+ (g>>1) + (b>>2); //This runs in 0.00065s on my pc and produces slightly darker results
//I_Out[index]=((unsigned int)(r+g+b))/3; //This runs in 0.0011s on my pc and produces a pure average
}

Check out the Color FAQ for information on this. These values come from the standardization of RGB values that we use in our displays. Actually, according to the Color FAQ, the values you are using are outdated, as they are the values used for the original NTSC standard and not modern monitors.

What is the source of these values?
The "source" of the coefficients posted are the NTSC specifications which can be seen in Rec601 and Characteristics of Television.
The "ultimate source" are the CIE circa 1931 experiments on human color perception. The spectral response of human vision is not uniform. Experiments led to weighting of tristimulus values based on perception. Our L, M, and S cones1 are sensitive to the light wavelengths we identify as "Red", "Green", and "Blue" (respectively), which is where the tristimulus primary colors are derived.2
The linear light3 spectral weightings for sRGB (and Rec709) are:
Rlin * 0.2126 + Glin * 0.7152 + Blin * 0.0722 = Y
These are specific to the sRGB and Rec709 colorspaces, which are intended to represent computer monitors (sRGB) or HDTV monitors (Rec709), and are detailed in the ITU documents for Rec709 and also BT.2380-2 (10/2018)
FOOTNOTES
(1) Cones are the color detecting cells of the eye's retina.
(2) However, the chosen tristimulus wavelengths are NOT at the "peak" of each cone type - instead tristimulus values are chosen such that they stimulate on particular cone type substantially more than another, i.e. separation of stimulus.
(3) You need to linearize your sRGB values before applying the coefficients. I discuss this in another answer here.

Starting a list to enumerate how different software packages do it. Here is a good CVPR paper to read as well.
FreeImage
#define LUMA_REC709(r, g, b) (0.2126F * r + 0.7152F * g + 0.0722F * b)
#define GREY(r, g, b) (BYTE)(LUMA_REC709(r, g, b) + 0.5F)
OpenCV
nVidia Performance Primitives
Intel Performance Primitives
Matlab
nGray = 0.299F * R + 0.587F * G + 0.114F * B;

These values vary from person to person, especially for people who are colorblind.

is all this really necessary, human perception and CRT vs LCD will vary, but the R G B intensity does not, Why not L = (R + G + B)/3 and set the new RGB to L, L, L?

Related

Implementing convolution from scratch in Julia

I am trying to implement convolution by hand in Julia. I'm not too familiar with image processing or Julia, so maybe I'm biting more than I can chew.
Anyway, when I apply this method with a 3*3 edge filter edge = [0 -1 0; -1 4 -1; 0 -1 0] as convolve(img, edge), I am getting an error saying that my values are exceeding the allowed values for the RGBA type.
Code
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_h, half_kernel_h)
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col])))
end
end
cpy_im
end
Error (original)
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
See the READMEs for FixedPointNumbers and ColorTypes for more information.
I am able to identify a simple case where such error may occur (a white pixel surrounded by all black pixels or vice-versa). I tried "fixing" this by attempting to follow the advice here from another stackoverflow question, but I get more errors to the effect of Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package..
Code attempting to apply solution from the other SO question
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ [-half_kernel_h, half_kernel_h]
from_col, to_col = col .+ [-half_kernel_h, half_kernel_h]
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col] ./ 2 .+ 128)))
end
end
cpy_im
end
Corresponding error
MethodError: no method matching +(::ColorTypes.RGBA{Float32}, ::Int64)
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:591
+(!Matched::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
+(!Matched::ChainRulesCore.AbstractThunk, ::Any) at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
Now, I can try using convert etc., but when I look at the big picture, I start to wonder what the idiomatic way of solving this problem in Julia is. And that is my question. If you had to implement convolution by hand from scratch, what would be a good way to do so?
EDIT:
Here is an implementation that works, though it may not be idiomatic
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_h, half_kernel_w) = size(kernel) .÷ 2
(height, width) = size(img)
cpy_im = copy(img)
# println(Dict("width" => width, "height" => height, "half_kernel_w" => half_kernel_w, "half_kernel_h" => half_kernel_h, "row range" => 1+half_kernel_h:(height-half_kernel_h), "col range" => 1+half_kernel_w:(width-half_kernel_w)))
for row ∈ 1+half_kernel_h:(height-half_kernel_h)
for col ∈ 1+half_kernel_w:(width-half_kernel_w)
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_w, half_kernel_w)
vals = Dict()
for method ∈ [red, green, blue, alpha]
x = sum((kernel .* method.(img[from_row:to_row, from_col:to_col])))
if x > 1
x = 1
elseif x < 0
x = 0
end
vals[method] = x
end
cpy_im[row, col] = RGBA(vals[red], vals[green], vals[blue], vals[alpha])
end
end
cpy_im
end
First of all, the error
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
should direct you to read the docs of the ColorVectorSpace package, where you will learn that using ColorVectorSpace will now enable math on RGB types. (The absence of default support it deliberate, because the way the image-processing community treats RGB is colorimetrically wrong. But everyone has agreed not to care, hence the ColorVectorSpace package.)
Second,
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
indicates that you're trying to write negative entries with an element type, N0f8, that can't support such values. Instead of cpy_im = copy(img), consider something like cpy_im = [float(c) for c in img] which will guarantee a floating-point representation that can support negative values.
Third, I would recommend avoiding steps like RGB.(img...) when nothing about your function otherwise addresses whether images are numeric, grayscale, or color. Fundamentally the only operations you need are scalar multiplication and addition, and it's better to write your algorithm generically leveraging only those two properties.
Tim Holy's answer above is correct - keep things simple and avoid relying on third-party packages when you don't need to.
I might point out that another option you may not have considered is to use a different algorithm. What you are implementing is the naive method, whereas many convolution routines using different algorithms for different sizes, such as im2col and Winograd (you can look these two up, I have a website that covers the idea behind both here).
The im2col routine might be worth doing as essentially you can break the routine in several pieces:
Unroll all 'regions' of the image to do a dot-product with the filter/kernel on, and stack them together into a single matrix.
Do a matrix-multiply with the unrolled input and filter/kernel.
Roll the output back into the correct shape.
It might be more complicated overall, but each part is simpler, so you may find this easier to do. A matrix multiply routine is definitely quite easy to implement. For 1x1 (single-pixel) convolutions where the image and filter have the same ordering (i.e. NCHW images and FCHW filter) the first and last steps are trivial as essentially no rolling/unrolling is necessary.
A final word of advice - start simpler and add in the code to handle edge-cases, convolutions are definitely fiddly to work with.
Hope this helps!

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

Making an image appear only in grayscale

I was writing a program that would scramble the colors of each pixel in an image so that the resulting image would be very noisy and hard to recogize but that it would appear normal in grayscale.
As soon as I wrote the program, I noticed a problem. The alogrithm I used assumed that the way to calculate grayscale was something like
Grayscale(Pixel p) {
average := (p.red + p.green + p.blue) / 3
p.red := p.green := p.blue := average;
}
I later realized that the numbers have weights attached to them:
Grayscale(Pixel p) {
average := (c1 * p.red^gamma) + (c2 * p.green^gamma) + (c3 * p.blue^gamma)
p.red := p.green := p.blue := average
}
and thus my program did not work.
I was specifically targeting the iOS grayscale filter, although the algorithm should be the same for any system.
(side note: I don't know the weights of the iOS values, so if anyone could give me those in the algorithm as well, that'd be pretty cool. I found this thread discussing a similar problem but it doesn't answer my question: https://www.reddit.com/r/compsci/comments/a1yrf1/what_algorithm_do_ios_devices_use_to_grayscale/ )
Assuming that the grayscale uses some sort of weights, what algorithm could I use to change the rgb values of the given pixel while keeping the grayscale values of the pixel the same?

Using own descriptor and feature in Visual Structure From Motion

Hi i'm using program Visual Structure From Motion to recover the structure of a 3d-place. However, i 've already computed my descriptors and my features; so i want to use them in Visual Structure From Motion.I've read that the file which contains informations about descriptor should has the following pattern:
[Header][Location Data][Descriptor Data][EOF]
[Header] = int[5] = {name, version, npoint, 5, 128};
name = ('S'+ ('I'<<8)+('F'<<16)+('T'<<24));
version = ('V'+('4'<<8)+('.'<<16)+('0'<<24)); or ('V'+('5'<<8)+('.'<<16)+('0'<<24)) if containing color info
npoint = number of features.
[Location Data] is a npoint x 5 float matrix and each row is [x, y, color, scale, orientation].
Write color by casting the float to unsigned char[4]
scale & orientation are only used for visualization, so you can simply write 0 for them
Sort features in the order of decreasing importance, since VisualSFM may use only part of those features.
VisualSFM sorts the features in the order of decreasing scales.
[Descriptor Data] is a npoint x 128 unsigned char matrix. Note the feature descriptors are normalized to 512.
[EOF] int eof_marker = (0xff+('E'<<8)+('O'<<16)+('F'<<24));
There's someone that write a concrete example of this file? This file should be generated automatically by my application.

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

Resources