I'm trying to understand these calculations for YUV420P to RGB conversion on an OpenGL fragment shader. On https://en.wikipedia.org/wiki/YUV there are lots of calculations but none of them look like the one below. Why take 0.0625 and 0.5 and 0.5 in the first part? And where did the second part come from?
yuv.r = texture(tex_y, TexCoord).r - 0.0625;
yuv.g = texture(tex_u, TexCoord).r - 0.5;
yuv.b = texture(tex_v, TexCoord).r - 0.5;
rgba.r = yuv.r + 1.596 * yuv.b
rgba.g = yuv.r - 0.813 * yuv.b - 0.391 * yuv.g;
rgba.b = yuv.r + 2.018 * yuv.g;
It may be an special color conversion for some specific YUV color scheme but I couldn't find anything on the internet.
Why take [...] 0.5 and 0.5 in the first part?
U and V are stored in the green and blue color channel of the texture. The values in the color channels are stored in the range [0.0, 1.0]. For the computations the values have to be in mapped to the range [-0.5, 0.5]:
yuv.g = texture(tex_u, TexCoord).r - 0.5;
yuv.b = texture(tex_v, TexCoord).r - 0.5;
Subtracting 0.0625 from the red color channel is just an optimization. Thus, It does not have to be subtracted separately in each expression later.
The algorithm is the same as in How to convert RGB -> YUV -> RGB (both ways) or various books.
Related
I've read about the power law (Gamma) Transformations so let's look to the equation: s = c*r^γ
Suppose that I have one pixel which has intensity of 37. If the gamma is 0.4 and c is 1, then the output intensity is 37^(0.4) which is 4.2. Thus it's darker, not brighter. But then why does it look brighter in the example in my textbook?
The gamma transformation applies to data in the range [0,1]. So, for your typical unsigned 8-bit integer image, you would have to scale it first to that range. The equation, including the scaling, then would be:
s = 255 * (r/255)^γ
Now you'd have, for r = 37 and γ = 0.4: s = 255 * (37/255)^0.4 = 117.8. This is brighter.
I have read the BT.709 spec a number of times and the thing that is just not clear is should an encoded H.264 bitstream actually apply any gamma curve to the encoded data? Note the specific mention of a gamma like formula in the BT.709 spec. Apple provided examples of OpenGL or Metal shaders that read YUV data from CoreVideo provided buffers do not do any sort of gamma adjustment. YUV values are being read and processed as though they are simple linear values. I also examined the source code of ffmpeg and found no gamma adjustments being applied after the BT.709 scaling step. I then created a test video with just two linear grayscale colors 5 and 26 corresponding to 2% and 10% levels. When converted to H.264 with both ffmpeg and iMovie, the output BT.709 values are (YCbCr) (20 128 128) and (38 128 128) and these values exactly match the output of the BT.709 conversion matrix without any gamma adjustment.
A great piece of background on this topic can be found at Quicktime Gamma Bug. It seems that some historical issues with Quicktime and Adobe encoders were improperly doing different gamma adjustments and the results made video streams look awful on different players. This is really confusing because if you compare to sRGB, it clearly indicates how to apply a gamma encoding and then decode it to convert between sRGB and linear. Why does BT.709 go into so much detail about the same sort of gamma adjustment curve if no gamma adjustment is applied after the matrix step when creating a h.264 data stream? Are all the color steps in a h.264 stream meant to be coded as straight linear (gamma 1.0) values?
In case specific example input would make things more clear, I am attaching 3 color bar images, the exact values of different colors can be displayed in an image editor with these image files.
This first image is in the sRGB colorspace and is tagged as sRGB.
This second image has been converted to the linear RGB colorspace and is tagged with a linear RGB profile.
This third image has been converted to REC.709 profile levels with Rec709-elle-V4-rec709.icc from elles_icc_profiles
. This seems to be what one would need to do to simulate "camera" gamma as described in BT.709.
Note how the sRGB value in the lower right corner (0x555555) becomes linear RGB (0x171717) and the BT.709 gamma encoded value becomes (0x464646). What is unclear is if I should be passing a linear RGB value into ffmpeg or if I should be passing an already BT.709 gamma encoded value which would then need to be decoded in the client before the linear conversion Matrix step to get back to RGB.
Update:
Based on the feedback, I have updated my C based implementation and Metal shader and uploaded to github as an iOS example project MetalBT709Decoder.
Encoding a normalized linear RGB value is implemented like this:
static inline
int BT709_convertLinearRGBToYCbCr(
float Rn,
float Gn,
float Bn,
int *YPtr,
int *CbPtr,
int *CrPtr,
int applyGammaMap)
{
// Gamma adjustment to non-linear value
if (applyGammaMap) {
Rn = BT709_linearNormToNonLinear(Rn);
Gn = BT709_linearNormToNonLinear(Gn);
Bn = BT709_linearNormToNonLinear(Bn);
}
// https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf
float Ey = (Kr * Rn) + (Kg * Gn) + (Kb * Bn);
float Eb = (Bn - Ey) / Eb_minus_Ey_Range;
float Er = (Rn - Ey) / Er_minus_Ey_Range;
// Quant Y to range [16, 235] (inclusive 219 values)
// Quant Eb, Er to range [16, 240] (inclusive 224 values, centered at 128)
float AdjEy = (Ey * (YMax-YMin)) + 16;
float AdjEb = (Eb * (UVMax-UVMin)) + 128;
float AdjEr = (Er * (UVMax-UVMin)) + 128;
*YPtr = (int) round(AdjEy);
*CbPtr = (int) round(AdjEb);
*CrPtr = (int) round(AdjEr);
return 0;
}
Decoding from YCbCr to linear RGB is implemented like so:
static inline
int BT709_convertYCbCrToLinearRGB(
int Y,
int Cb,
int Cr,
float *RPtr,
float *GPtr,
float *BPtr,
int applyGammaMap)
{
// https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion
// http://www.niwa.nu/2013/05/understanding-yuv-values/
// Normalize Y to range [0 255]
//
// Note that the matrix multiply will adjust
// this byte normalized range to account for
// the limited range [16 235]
float Yn = (Y - 16) * (1.0f / 255.0f);
// Normalize Cb and CR with zero at 128 and range [0 255]
// Note that matrix will adjust to limited range [16 240]
float Cbn = (Cb - 128) * (1.0f / 255.0f);
float Crn = (Cr - 128) * (1.0f / 255.0f);
const float YScale = 255.0f / (YMax-YMin);
const float UVScale = 255.0f / (UVMax-UVMin);
const
float BT709Mat[] = {
YScale, 0.000f, (UVScale * Er_minus_Ey_Range),
YScale, (-1.0f * UVScale * Eb_minus_Ey_Range * Kb_over_Kg), (-1.0f * UVScale * Er_minus_Ey_Range * Kr_over_Kg),
YScale, (UVScale * Eb_minus_Ey_Range), 0.000f,
};
// Matrix multiply operation
//
// rgb = BT709Mat * YCbCr
// Convert input Y, Cb, Cr to normalized float values
float Rn = (Yn * BT709Mat[0]) + (Cbn * BT709Mat[1]) + (Crn * BT709Mat[2]);
float Gn = (Yn * BT709Mat[3]) + (Cbn * BT709Mat[4]) + (Crn * BT709Mat[5]);
float Bn = (Yn * BT709Mat[6]) + (Cbn * BT709Mat[7]) + (Crn * BT709Mat[8]);
// Saturate normalzied linear (R G B) to range [0.0, 1.0]
Rn = saturatef(Rn);
Gn = saturatef(Gn);
Bn = saturatef(Bn);
// Gamma adjustment for RGB components after matrix transform
if (applyGammaMap) {
Rn = BT709_nonLinearNormToLinear(Rn);
Gn = BT709_nonLinearNormToLinear(Gn);
Bn = BT709_nonLinearNormToLinear(Bn);
}
*RPtr = Rn;
*GPtr = Gn;
*BPtr = Bn;
return 0;
}
I believe this logic is implemented correctly, but I am having a very difficult time validating the results. When I generate a .m4v file that contains gamma adjusted color values (osxcolor_test_image_24bit_BT709.m4v), the result come out as expected. But a test case like (bars_709_Frame01.m4v) that I found here does not seem to work as the color bar values seem to be encoded as linear (no gamma adjustment).
For a SMPTE test pattern, the 0.75 graylevel is linear RGB (191 191 191), should this RGB be encoded with no gamma adjustment as (Y Cb Cr) (180 128 128) or should the value in the bitstream appear as the gamma adjusted (Y Cb Cr) (206 128 128)?
(follow up)
After doing additional research into this gamma issue, it has become clear that what Apple is actually doing in AVFoundation is using a 1.961 gamma function. This is the case when encoding with AVAssetWriterInputPixelBufferAdaptor, when using vImage, or with CoreVideo APIs. This piecewise gamma function is defined as follows:
#define APPLE_GAMMA_196 (1.960938f)
static inline
float Apple196_nonLinearNormToLinear(float normV) {
const float xIntercept = 0.05583828f;
if (normV < xIntercept) {
normV *= (1.0f / 16.0f);
} else {
const float gamma = APPLE_GAMMA_196;
normV = pow(normV, gamma);
}
return normV;
}
static inline
float Apple196_linearNormToNonLinear(float normV) {
const float yIntercept = 0.00349f;
if (normV < yIntercept) {
normV *= 16.0f;
} else {
const float gamma = 1.0f / APPLE_GAMMA_196;
normV = pow(normV, gamma);
}
return normV;
}
Your original question: Does H.264 encoded video with BT.709 matrix include any gamma adjustment?
The encoded video only contains gamma adjustment - if you feed the encoder gamma adjusted values.
A H.264 encoder doesn't care about the transfer characteristics.
So if you compress linear and then decompress - you'll get linear.
So if you compress with gamma and then decompress - you'll get gamma.
Or if your bits are encoded with a Rec. 709 transfer function - the encoder won't change the gamma.
But you can specify the transfer characteristic in the H.264 stream as metadata. (Rec. ITU-T H.264 (04/2017) E.1.1 VUI parameters syntax). So the encoded streams carries the color space information around but it is not used in encoding or decoding.
I would assume that 8 bit video always contains a non linear transfer function. Otherwise you would use the 8 bit fairly unwisely.
If you convert to linear to do effects and composition - I'd recommend increasing the bit depth or linearizing into floats.
A color space consists of primaries, transfer function and matrix coefficients.
The gamma adjustment is encoded in the transfer function (and not in the matrix).
GPUImage's LookupFilter uses an RGB pixel map that's 512x512. When the filter executes, it creates a comparison between a modified version of this image with the original, and extrapolates an image filter.
The filter code is pretty straightforward. Here's an extract so you can see what's going on:
void main()
{
highp vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
highp float blueColor = textureColor.b * 63.0;
highp vec2 quad1;
quad1.y = floor(floor(blueColor) / 8.0);
quad1.x = floor(blueColor) - (quad1.y * 8.0);
highp vec2 quad2;
quad2.y = floor(ceil(blueColor) / 8.0);
quad2.x = ceil(blueColor) - (quad2.y * 8.0);
highp vec2 texPos1;
texPos1.x = (quad1.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.r);
texPos1.y = (quad1.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.g);
highp vec2 texPos2;
texPos2.x = (quad2.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.r);
texPos2.y = (quad2.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.g);
lowp vec4 newColor1 = texture2D(inputImageTexture2, texPos1);
lowp vec4 newColor2 = texture2D(inputImageTexture2, texPos2);
lowp vec4 newColor = mix(newColor1, newColor2, fract(blueColor));
gl_FragColor = mix(textureColor, vec4(newColor.rgb, textureColor.w), intensity);
}
);
See where the filter map is dependent on this being a 512x512 image?
I'm looking at ways to 4x the color depth here, using a 1024x1024 source image instead, but I'm not sure how this lookup filter image would have originally been generated.
Can something like this be generated in code? If so, I realize it's a very broad question, but how would I go about doing that? If it can't be generated in code, what are my options?
—-
Update:
Turns out the original LUT generation code was included in the header file all along. The questionable part here is from the header file:
Lookup texture is organised as 8x8 quads of 64x64 pixels representing all possible RGB colors:
How is 64x64 a map of all possible RGB channels? 64³ = 262,144 but that only accounts for 1/64th of the presumed 24-bit capacity of RGB, which is 64³ (16,777,216). What's going on here? Am I missing the way this LUT works? How are we accounting for all possible RGB colors with only 1/64th of the data?
for (int by = 0; by < 8; by++) {
for (int bx = 0; bx < 8; bx++) {
for (int g = 0; g < 64; g++) {
for (int r = 0; r < 64; r++) {
image.setPixel(r + bx * 64, g + by * 64, qRgb((int)(r * 255.0 / 63.0 + 0.5),
(int)(g * 255.0 / 63.0 + 0.5),
(int)((bx + by * 8.0) * 255.0 / 63.0 + 0.5)));
}
}
}
}
I'm not quite sure what problem you are actually having. When you say you want "4x the color depth" what do you actually mean. Color depth normally means the number of bits per color channel (or per pixel), which is totally independent of the resolution of the image.
In terms of lookup table accuracy (which is resolution dependent), assuming you are using bilinear filtered texture inputs from the original texture, and filtered lookups into the transform table, then you are already linearly interpolating between samples in the lookup table. Interpolation of color channels will be at higher precision than the storage format; e.g. often fp16 equivalent, even for textures stored at 8-bit per pixel.
Unless you have a significant amount of non-linearity in your color transform (not that common) adding more samples to the lookup table is unlikely to make a significant difference to the output - the interpolation will already be doing a reasonably good job of filling in the gaps.
Lev Zelensky provided the original work for this, so I'm not as familiar with how this works internally, but you can look at the math being performed in the shader to get an idea of what's going on.
In the 512x512 lookup, you have an 8x8 grid of cells. Within those cells, you have a 64x64 image patch. The red values go from 0 to 255 (0.0 to 1.0 in normalized values) going from left to right in that patch, and the green values go from 0 to 255 going down. That means that there are 64 steps in red, and 64 steps in green.
Each cell then appears to increase the blue value as you progress down the patches, left to right, top to bottom. With 64 patches, that gives you 64 blue values to match the 64 red and green ones. That gives you equal coverage across the RGB values in all channels.
So, if you wanted to double the number of color steps, you'd have to double the patch size to 128x128 and have 128 grids. It'd have to be more of a rectangle due to 128 not having an integer square root. Just going to 1024x1024 might let you double the color depth in the red and green channels, but blue would now be half their depth. Balancing the three out would be a little trickier than just doubling the image size.
I believe this could be relative to Core Image in iOS as well as in Mac OS.
I am able to get a RGB Histogram to show up using CIAreaHistogram + CIHistogramDisplayFilter in Core Image. Is there a way to get just LUMINANCE instead of RGB separately?
Here's how to generate a histogram image (iOS and Mac OS X), provided you've already created a CIImage object (ciImage):
ciImage = [CIFilter filterWithName:#"CIAreaHistogram" keysAndValues:kCIInputImageKey, ciImage, #"inputExtent", ciImage.extent, #"inputScale", [NSNumber numberWithFloat:1.0], #"inputCount", [NSNumber numberWithFloat:256.0], nil].outputImage;
ciImage = [CIFilter filterWithName:#"CIHistogramDisplayFilter" keysAndValues:kCIInputImageKey, ciImage, #"inputHeight", [NSNumber numberWithFloat:100.0], #"inputHighLimit", [NSNumber numberWithFloat:1.0], #"inputLowLimit", [NSNumber numberWithFloat:0.0], nil].outputImage;
There are a hundred different solutions out there for displaying a histogram; this is simple (merely two lines of code), and works everywhere, flawlessly.
On to outputting the luminance channel only of a color image, and passing it to the histogram-related filters...
Do you know how to create a custom Core Image filter that returns the output of a CIKernel (or CIColorKernel) object? If not, you should; and, I'd be happy to provide you with easy-to-understand instructions for doing that.
Assuming you do, here's the OpenGL ES code that will return only the luminance values of an image it processes:
vec4 rgb2hsl(vec4 color)
{
//Compute min and max component values
float MAX = max(color.r, max(color.g, color.b));
float MIN = min(color.r, min(color.g, color.b));
//Make sure MAX > MIN to avoid division by zero later
MAX = max(MIN + 1e-6, MAX);
//Compute luminosity
float l = (MIN + MAX) / 2.0;
//Compute saturation
float s = (l < 0.5 ? (MAX - MIN) / (MIN + MAX) : (MAX - MIN) / (2.0 - MAX - MIN));
//Compute hue
float h = (MAX == color.r ? (color.g - color.b) / (MAX - MIN) : (MAX == color.g ? 2.0 + (color.b - color.r) / (MAX - MIN) : 4.0 + (color.r - color.g) / (MAX - MIN)));
h /= 6.0;
h = (h < 0.0 ? 1.0 + h : h);
return vec4(h, s, l, color.a);
}
kernel vec4 hsl(sampler image)
{
//Get pixel from image (assume its alpha is 1.0 and don't unpremultiply)
vec4 pixel = unpremultiply(sample(image, samplerCoord(image)));
//Convert to HSL; only display luminance value
return premultiply(vec4(vec3(rgb2hsl(pixel).b), 1.0));
}
The above is OpenGL ES code written originally by Apple developers; I modified it to display only the luminance values.
Again: if you don't know how to at least plug-in kernels into a custom Core Image filter, learn how. I can show you.
Yes, and quite beautifully, too. This app will give you a quick start on creating the appearance of the chart, plus the Objective-C code for using Apple's frameworks and their API for adding it to your app:
http://www.infragistics.com/products/ios
I'm working on the same thing now, so I'd be very interested to see how you're coming along. Please keep in touch.
I was able to create a fragment shader to convert a color image to greyscale, by:
float luminance = pixelColor.r * 0.299 + pixelColor.g * 0.587 + pixelColor.b * 0.114;
gl_FragColor = vec4(luminance, luminance, luminance, 1.0);
Now I'd like to mimic a Photoshop channel mixer effect:
How can I translate the % percentage values (-70%, +200%, -30%) into r g b floating point numbers (e.g. 0.299, 0.587, 0.114)?
You should know from school that 10% of a value means multiplying that value by 0.1, so just use (-0.7, 2.0, -0.3).