Convert YUV4:4:4 to YUV4:2:2 images - image-processing

There is a lot of information on the internet about the differences between YUV4:4:4 to YUV4:2:2 formats, however, I can not find anything that tells how to convert the YUV4:4:4 to YUV4:2:2. Since such conversion is performed using software, I was hoping that there should be some developers that have done it and could direct me to the sources that describe the conversion algorithm. Of course, the software code would be nice to have, but having the access to the theory would be sufficient enough to write my own software. Specifically, I would like to know pixel structure and how the bytes are managed during conversion.
I found several similar questions like this and this, however, could not get my question answered. Also, I posted this question on the Photography forum, and they considered it as a software question.

The reason why you can't find specific description, is that there are many ways to do it.
Lets start from Wikipedia: https://en.wikipedia.org/wiki/Chroma_subsampling#4:2:2
4:4:4:
Each of the three Y'CbCr components have the same sample rate, thus there is no chroma subsampling. This scheme is sometimes used in high-end film scanners and cinematic post production.
and
4:2:2:
The two chroma components are sampled at half the sample rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third with little to no visual difference.
Note: Terms YCbCr and YUV are used interchangeably.
https://en.wikipedia.org/wiki/YCbCr
Y′CbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used interchangeably, leading to some confusion; when referring to signals in video or digital form, the term "YUV" mostly means "Y′CbCr".
Data memory ordering:
Again there is more than one format.
Intel IPP documentation defines two main categories: "Pixel-Order Image Formats" and "Planar Image Formats".
There is a nice documentation here: https://software.intel.com/en-us/node/503876
Refer here: http://www.fourcc.org/yuv.php#NV12 for YUV pixel arrangement formats.
Refer here: http://scc.ustc.edu.cn/zlsc/sugon/intel/ipp/ipp_manual/IPPI/ippi_ch6/ch6_image_downsampling.htm#ch6_image_downsampling for downsampling description.
Let's assume "Pixel-Order" format:
YUV 4:4:4 data order: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2 Y3 U3 V3
YUV 4:2:2 data order: Y0 U0 Y1 V0 Y2 U1 Y3 V1
Each element is a single byte, and Y0 is the lower byte in memory.
The 4:2:2 data order described above is named UYVY or YUY2 pixel format.
Conversion algorithms:
"Naive sub-sampling":
"Throw" every second U/V component:
Take U0, and throw U1, take V0 and throw V1...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 U0 Y1 V0 Y2 U2 Y3 V2
I can't recommend it, since it causes aliasing artifacts.
Average each U/V pair:
Take Destination U0 equals source (U0+U1)/2, same for V0...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 (U0+U1)/2 Y1 (V0+V1)/2 Y2 (U2+U3)/2 Y3 (V2+V3)/2
Use other interpolation method for down-sampling U and V (cubic interpolation for example).
Usually you will not be able to see any differences compared to simple average.
C implementation:
The question is not tagged as C, but I think the following C implementation may be helpful.
The following code converts pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2 by averaging each U/V pair:
//Convert single row I0 from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//Save the result in J0.
//I0 size in bytes is image_width*3
//J0 size in bytes is image_width*2
static void ConvertRowYUV444ToYUV422(const unsigned char I0[],
const int image_width,
unsigned char J0[])
{
int x;
//Process two Y,U,V triples per iteration:
for (x = 0; x < image_width; x += 2)
{
//Load source elements
unsigned char y0 = I0[x*3]; //Load source Y element
unsigned int u0 = (unsigned int)I0[x*3+1]; //Load source U element (and convert from uint8 to uint32).
unsigned int v0 = (unsigned int)I0[x*3+2]; //Load source V element (and convert from uint8 to uint32).
//Load next source elements
unsigned char y1 = I0[x*3+3]; //Load source Y element
unsigned int u1 = (unsigned int)I0[x*3+4]; //Load source U element (and convert from uint8 to uint32).
unsigned int v1 = (unsigned int)I0[x*3+5]; //Load source V element (and convert from uint8 to uint32).
//Calculate destination U, and V elements.
//Use shift right by 1 for dividing by 2.
//Use plus 1 before shifting - round operation instead of floor operation.
unsigned int u01 = (u0 + u1 + 1) >> 1; //Destination U element equals average of two source U elements.
unsigned int v01 = (v0 + v1 + 1) >> 1; //Destination U element equals average of two source U elements.
J0[x*2] = y0; //Store Y element (unmodified).
J0[x*2+1] = (unsigned char)u01; //Store destination U element (and cast uint32 to uint8).
J0[x*2+2] = y1; //Store Y element (unmodified).
J0[x*2+3] = (unsigned char)v01; //Store destination V element (and cast uint32 to uint8).
}
}
//Convert image I from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//I - Input image in pixel-order data YUV 4:4:4 format.
//image_width - Number of columns of image I.
//image_height - Number of rows of image I.
//J - Destination "image" in pixel-order data YUV 4:2:2 format.
//Note: The term "YUV" referees to "Y'CbCr".
//I is pixel ordered YUV 4:4:4 format (size in bytes is image_width*image_height*3):
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//
//J is pixel ordered YUV 4:2:2 format (size in bytes is image_width*image_height*2):
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//
//Conversion algorithm:
//Each element of destination U is average of 2 original U horizontal elements
//Each element of destination V is average of 2 original V horizontal elements
//
//Limitations:
//1. image_width must be a multiple of 2.
//2. I and J must be two separate arrays (in place computation is not supported).
static void ConvertYUV444ToYUV422(const unsigned char I[],
const int image_width,
const int image_height,
unsigned char J[])
{
//I0 points source row.
const unsigned char *I0; //I0 -> YUYVYUYV...
//J0 and points destination row.
unsigned char *J0; //J0 -> YUYVYUYV
int y; //Row index
//In each iteration process single row.
for (y = 0; y < image_height; y++)
{
I0 = &I[y*image_width*3]; //Input row width is image_width*3 bytes (each pixel is Y,U,V).
J0 = &J[y*image_width*2]; //Output row width is image_width*2 bytes (each two pixels are Y,U,Y,V).
//Process single source row into single destination row
ConvertRowYUV444ToYUV422(I0, image_width, J0);
}
}
Planar representation of YUV 4:2:2
Planar representation may be more intuitive than "Pixel-Order" format.
In planar representation each color channel is represented as a separate matrix, which can be displayed as an image.
Example:
Original image in RGB format (before converting to YUV):
Image channels in YUV 4:4:4 format:
(Left YUV triple is represented in gray levels, and right YUV triple is represented using false colors).
Image channels in YUV 4:2:2 format (after horizontal Chroma subsampling):
(Left YUV triple is represented in gray levels, and right YUV triple is represented using "false colors").
As you can see, in 4:2:2 format, the U an V channels are down-sampled (shrunk) in the horizontal axis.
Remark:
The "false colors" representation of U and V channels is used for emphasizing that Y is the Luma channel and U and V are the Chrominance channels.
Higher order interpolation and Anti-Aliasing filter:
Following MATLAB code sample shows how to perform down-sampling with higher order interpolation and Anti-Aliasing filter.
The sample also shows the down-sampling method used by FFMPEG.
Note: you don't need to know MATLAB programming in order to understand the samples.
You do need some knowledge of image filtering by convolution between a Kernel and an image.
%Prepare the input:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load('mandrill.mat', 'X', 'map'); %Load input image
RGB = im2uint8(ind2rgb(X, map)); %Convert to RGB (the mandrill sample image is an indexed image)
YUV = rgb2ycbcr(RGB); %Convert from RGB to YUV (MATLAB function rgb2ycbcr uses BT.601 conversion formula)
%Separate YUV to 3 planes (Y plane, U plane and V plane)
Y = YUV(:, :, 1);
U = YUV(:, :, 2);
V = YUV(:, :, 3);
U = double(U); %Work in double precision instead of uint8.
[M, N] = size(Y); %Image size is N columns by M rows.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation (without Anti-Aliasing filter).
%Simple averaging is equivalent to linear interpolation.
U2 = (U(:, 1:2:end) + U(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', false); %Use MATLAB imresize function as reference
disp(['Linear interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation (without Anti-Aliasing filter).
%Following operations are equivalent to cubic interpolation:
%1. Convolution with filter kernel [-0.125, 1.25, -0.125]
%2. Averaging pair elements
fU = imfilter(U, [-0.125, 1.25, -0.125], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (valid range of U elements in uint8 format is [16, 240])
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', false); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'bilinear' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [0.25, 0.5, 0.25]
%2. Averaging pair elements
fU = imfilter(U, [0.25, 0.5, 0.25], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', true); %Use MATLAB imresize function as reference
disp(['Linear interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'cubic' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375]
%2. Averaging pair elements
h = [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375];
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', true); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG implementation of horizontal down-sampling U plane.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG uses cubic interpolation with Anti-Aliasing filter (different filter kernel):
%Remark: I didn't check the source code of FFMPEG to verify the values of the filter kernel.
%I can't tell how FFMPEG actually implements the conversion.
%Following operations are equivalent to FFMPEG implementation (with minor differences):
%1. Convolution with filter kernel [-115, -231, 1217, 2354, 1217, -231, -115]/4096
%2. Averaging pair elements
h = [-115, -231, 1217, 2354, 1217, -231, -115]/4096;
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (FFMPEG actually doesn't limit the result)
%Save Y,U,V planes to file in format supported by FFMPEG
f = fopen('yuv444.yuv', 'w');
fwrite(f, Y', 'uint8');
fwrite(f, U', 'uint8');
fwrite(f, V', 'uint8');
fclose(f);
%For executing FFMPEG within MATLAB, download FFMPEG and place the executable in working directory (ffmpeg.exe for Windows)
%FFMPEG converts source file in YUV444 format to destination file in YUV422 format.
if isunix
[status, cmdout] = system(['./ffmpeg -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
else
[status, cmdout] = system(['ffmpeg.exe -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
end
f = fopen('yuv422.yuv', 'r');
refY = (fread(f, [N, M], '*uint8'))';
refU2 = (fread(f, [N/2, M], '*uint8'))'; %Read down-sampled U plane (FFMPEG result from file).
refV2 = (fread(f, [N/2, M], '*uint8'))';
fclose(f);
%Limit to valid range of U elements.
%In FFMPEG down-sampled U and V may exceed valid range (there is probably a way to tell FFMPEG to limit the result).
refU2 = max(min(refU2, 240), 16);
%Difference exclude first column and last column (FFMPEG treats the margins different than MATLAB)
%Remark: There are minor differences due to rounding (I guess).
disp(['FFMPEG Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(max(abs(double(U2(:, 2:end-1)) - double(refU2(:, 2:end-1))))))]);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Examples for different kind of down-sampling methods.
Linear interpolation versus Cubic interpolation with Anti-Aliasing filter:
In the first example (mandrill) there are no visible differences.
In the second example (circle and rectangle) there are minor visible differences.
The third example (lines) demonstrates aliasing artifacts.
Remark: displayed images where up-sampled from YUV422 to YUV444 using Cubic interpolation and converted from YUV444 to RGB.
Linear interpolation versus Cubic with Anti-Aliasing (mandrill):
Linear interpolation versus Cubic with Anti-Aliasing (circle and rectangle):
Linear interpolation versus Cubic with Anti-Aliasing (demonstrates Aliasing artifacts):

Related

How to undistort I420 image data? Efficiently

I am able to undistort RGB image successfully.
Now, I am working on directly undistort I420 data, instead of first converting it to RGB.
Below are the steps I followed after camera calibration.
K = cv::Matx33d(541.2152931632737, 0.0, 661.7479652584254,
0.0, 541.0606969363056, 317.4524205037745,
0.0, 0.0, 1.0);
D = cv::Vec4d(-0.042166406281296365, -0.001223961942208027, -0.0017036710622692108, 0.00023929900459453295);
newSize = cv::Size(3400, 1940);
cv::Matx33d new_K;
cv::fisheye::estimateNewCameraMatrixForUndistortRectify(K, D, cv::Size(W, H), cv::Mat::eye(3, 3, CV_64F), new_K, 1, newSize); // W,H are the distorted image size
cv::fisheye::initUndistortRectifyMap(K, D, cv::Mat::eye(3, 3, CV_64F), new_K, newSize, CV_16SC2, mapx, mapy);
cv::remap(src, dst, mapx, mapy, cv::INTER_LINEAR);
Above code is giving me undistorted image successfully.
Now I want to undistort I420 data. So, now my src will be an I420/YV12 data.
How can I undistort an I420 data, without converting it first to RGB?
By the way
I420 is an image format with only 1 channel(unlike 3 channels in RGB). It has height = 1.5*image height. Its width is equal to image width.
Below code is to convert I420 to BGR
cvtColor(src, BGR, CV_YUV2BGR_I420, 3);
BGR - pixel arrangement
I420 - pixel arrangement
The most efficient solution is resizing mapx and mapy and applying shrunk maps on down-sampled U and V channels:
Shrink mapx and mapy by a factor of x2 in each axis - create smaller maps matrices.
Divide all elements of shrank maps by 2 (applies mapping lower resolution image).
Apply mapx and mapy on Y color channel.
Apply shrunk_mapx and shrunk_mapy on down-sampled U and V color channels.
Here is a Python OpenCV sample code (please read the comments):
import cv2 as cv
import numpy as np
# For the example, read Y, U and V as separate images.
srcY = cv.imread('DistortedChessBoardY.png', cv.IMREAD_GRAYSCALE) # Y color channel (1280x720)
srcU = cv.imread('DistortedChessBoardU.png', cv.IMREAD_GRAYSCALE) # U color channel (640x360)
srcV = cv.imread('DistortedChessBoardV.png', cv.IMREAD_GRAYSCALE) # V color channel (640x360)
H, W = srcY.shape[0], srcY.shape[1]
K = np.array([[541.2152931632737, 0.0, 661.7479652584254],
[0.0, 541.0606969363056, 317.4524205037745],
[0.0, 0.0, 1.0]])
D = np.array([-0.042166406281296365, -0.001223961942208027, -0.0017036710622692108, 0.00023929900459453295])
# newSize = cv::Size(3400, 1940);
newSize = (850, 480)
# cv::Matx33d new_K;
new_K = np.eye(3)
# cv::fisheye::estimateNewCameraMatrixForUndistortRectify(K, D, cv::Size(W, H), cv::Mat::eye(3, 3, CV_64F), new_K, 1, newSize); // W,H are the distorted image size
new_K = cv.fisheye.estimateNewCameraMatrixForUndistortRectify(K, D, (W, H), np.eye(3), new_K, 1, newSize)
# cv::fisheye::initUndistortRectifyMap(K, D, cv::Mat::eye(3, 3, CV_64F), new_K, newSize, CV_16SC2, mapx, mapy);
mapx, mapy = cv.fisheye.initUndistortRectifyMap(K, D, np.eye(3), new_K, newSize, cv.CV_16SC2);
# cv::remap(src, dst, mapx, mapy, cv::INTER_LINEAR);
dstY = cv.remap(srcY, mapx, mapy, cv.INTER_LINEAR)
# Resize mapx and mapy by a factor of x2 in each axis, and divide each element in the map by 2
shrank_mapSize = (mapx.shape[1]//2, mapx.shape[0]//2)
shrunk_mapx = cv.resize(mapx, shrank_mapSize, interpolation = cv.INTER_LINEAR) // 2
shrunk_mapy = cv.resize(mapy, shrank_mapSize, interpolation = cv.INTER_LINEAR) // 2
# Remap U and V using shunk maps
dstU = cv.remap(srcU, shrunk_mapx, shrunk_mapy, cv.INTER_LINEAR, borderValue=128)
dstV = cv.remap(srcV, shrunk_mapx, shrunk_mapy, cv.INTER_LINEAR, borderValue=128)
cv.imshow('dstY', dstY)
cv.imshow('dstU', dstU)
cv.imshow('dstV', dstV)
cv.waitKey(0)
cv.destroyAllWindows()
Result:
Y:
U:
V:
After converting to RGB:
C++ implementation considerations:
Since I420 format arranges Y, U and V as 3 continuous planes in memory, it's simple to set a pointer to each "plane", and treat it as a Grayscale image.
Same data ordering applies the output image - set 3 pointer to output "planes".
Illustration (assuming even width and height, and assume byte stride equals width):
srcY -> YYYYYYYY dstY -> YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY remap YYYYYYYYYYYY
YYYYYYYY ======> YYYYYYYYYYYY
srcU -> UUUU YYYYYYYYYYYY
UUUU dstU -> YYYYYYYYYYYY
UUUU UUUUUU
srcV -> VVVV UUUUUU
VVVV UUUUUU
VVVV UUUUUU
dstV -> VVVVVV
VVVVVV
VVVVVV
VVVVVV
Implementing above illustration is C++
Under the assumption that width and height are even, and byte stride equals width, you can use the following C++ example for converting I420 to Y, U and V planes:
Assume: srcI420 is Wx(H*3/2) matrix in I420 format, like cv::Mat srcI420(cv::Size(W, H * 3 / 2), CV_8UC1);.
int W = 1280, H = 720; //Assume resolution of Y plane is 1280x720
//Pointer to Y plane
unsigned char *pY = (unsigned char*)srcI420.data;
//Y plane as cv::Mat, resolution of srcY is 1280x720
cv::Mat srcY = cv::Mat(cv::Size(W, H), CV_8UC1, (void*)pY);
//U plane as cv::Mat, resolution of srcU is 640x360 (in memory buffer, U plane is placed after Y).
cv::Mat srcU = cv::Mat(cv::Size(W/2, H/2), CV_8UC1, (void*)(pY + W*H));
//V plane as cv::Mat, resolution of srcV is 640x360 (in memory buffer, V plane is placed after U).
cv::Mat srcV = cv::Mat(cv::Size(W / 2, H / 2), CV_8UC1, (void*)(pY + W*H + (W/2*H/2)));
//Display srcY, srcU, srcV for testing
cv::imshow("srcY", srcY);
cv::imshow("srcU", srcU);
cv::imshow("srcV", srcV);
cv::waitKey(0);
Above example uses pointer manipulations, without the need for copying the data.
You can use the same pointer manipulations for your destination I420 image.
Note: The solution is going to work in most cases, but not guaranteed to work in all cases.
EDIT: Components are not interleaved in the YV12 format, so the following will not work:
If the YV12 data is a one channel image, the interpolation of the remap operation is applied to the value represented by all three YUV data instead of individual Y, U and V components.
Therefore, roughly speaking, instead of doing a
c.YYYYYYYY, c.UU, c.VV
it will perform a
c.YYYYYYYYUUVV
during a linear interpolation.
You can perform a YV12 -> BGR color conversion after remap, but the colors of the interpolated pixels would be wrong.
Instead of doing a linear interpolation, try using a nearest-neighbor interpolation in remap. Then you should be able to get correct colors after YV12 -> BGR color conversion.
So, find mapx, mapy, then remap using INTER_NEAREST, and finally perform a YV12 -> BGR color conversion.

Why does image getting brighter when we decrease gamma?

I've read about the power law (Gamma) Transformations so let's look to the equation: s = c*r^γ
Suppose that I have one pixel which has intensity of 37. If the gamma is 0.4 and c is 1, then the output intensity is 37^(0.4) which is 4.2. Thus it's darker, not brighter. But then why does it look brighter in the example in my textbook?
The gamma transformation applies to data in the range [0,1]. So, for your typical unsigned 8-bit integer image, you would have to scale it first to that range. The equation, including the scaling, then would be:
s = 255 * (r/255)^γ
Now you'd have, for r = 37 and γ = 0.4: s = 255 * (37/255)^0.4 = 117.8. This is brighter.

Does H.264 encoded video with BT.709 matrix include any gamma adjustment?

I have read the BT.709 spec a number of times and the thing that is just not clear is should an encoded H.264 bitstream actually apply any gamma curve to the encoded data? Note the specific mention of a gamma like formula in the BT.709 spec. Apple provided examples of OpenGL or Metal shaders that read YUV data from CoreVideo provided buffers do not do any sort of gamma adjustment. YUV values are being read and processed as though they are simple linear values. I also examined the source code of ffmpeg and found no gamma adjustments being applied after the BT.709 scaling step. I then created a test video with just two linear grayscale colors 5 and 26 corresponding to 2% and 10% levels. When converted to H.264 with both ffmpeg and iMovie, the output BT.709 values are (YCbCr) (20 128 128) and (38 128 128) and these values exactly match the output of the BT.709 conversion matrix without any gamma adjustment.
A great piece of background on this topic can be found at Quicktime Gamma Bug. It seems that some historical issues with Quicktime and Adobe encoders were improperly doing different gamma adjustments and the results made video streams look awful on different players. This is really confusing because if you compare to sRGB, it clearly indicates how to apply a gamma encoding and then decode it to convert between sRGB and linear. Why does BT.709 go into so much detail about the same sort of gamma adjustment curve if no gamma adjustment is applied after the matrix step when creating a h.264 data stream? Are all the color steps in a h.264 stream meant to be coded as straight linear (gamma 1.0) values?
In case specific example input would make things more clear, I am attaching 3 color bar images, the exact values of different colors can be displayed in an image editor with these image files.
This first image is in the sRGB colorspace and is tagged as sRGB.
This second image has been converted to the linear RGB colorspace and is tagged with a linear RGB profile.
This third image has been converted to REC.709 profile levels with Rec709-elle-V4-rec709.icc from elles_icc_profiles
. This seems to be what one would need to do to simulate "camera" gamma as described in BT.709.
Note how the sRGB value in the lower right corner (0x555555) becomes linear RGB (0x171717) and the BT.709 gamma encoded value becomes (0x464646). What is unclear is if I should be passing a linear RGB value into ffmpeg or if I should be passing an already BT.709 gamma encoded value which would then need to be decoded in the client before the linear conversion Matrix step to get back to RGB.
Update:
Based on the feedback, I have updated my C based implementation and Metal shader and uploaded to github as an iOS example project MetalBT709Decoder.
Encoding a normalized linear RGB value is implemented like this:
static inline
int BT709_convertLinearRGBToYCbCr(
float Rn,
float Gn,
float Bn,
int *YPtr,
int *CbPtr,
int *CrPtr,
int applyGammaMap)
{
// Gamma adjustment to non-linear value
if (applyGammaMap) {
Rn = BT709_linearNormToNonLinear(Rn);
Gn = BT709_linearNormToNonLinear(Gn);
Bn = BT709_linearNormToNonLinear(Bn);
}
// https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf
float Ey = (Kr * Rn) + (Kg * Gn) + (Kb * Bn);
float Eb = (Bn - Ey) / Eb_minus_Ey_Range;
float Er = (Rn - Ey) / Er_minus_Ey_Range;
// Quant Y to range [16, 235] (inclusive 219 values)
// Quant Eb, Er to range [16, 240] (inclusive 224 values, centered at 128)
float AdjEy = (Ey * (YMax-YMin)) + 16;
float AdjEb = (Eb * (UVMax-UVMin)) + 128;
float AdjEr = (Er * (UVMax-UVMin)) + 128;
*YPtr = (int) round(AdjEy);
*CbPtr = (int) round(AdjEb);
*CrPtr = (int) round(AdjEr);
return 0;
}
Decoding from YCbCr to linear RGB is implemented like so:
static inline
int BT709_convertYCbCrToLinearRGB(
int Y,
int Cb,
int Cr,
float *RPtr,
float *GPtr,
float *BPtr,
int applyGammaMap)
{
// https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion
// http://www.niwa.nu/2013/05/understanding-yuv-values/
// Normalize Y to range [0 255]
//
// Note that the matrix multiply will adjust
// this byte normalized range to account for
// the limited range [16 235]
float Yn = (Y - 16) * (1.0f / 255.0f);
// Normalize Cb and CR with zero at 128 and range [0 255]
// Note that matrix will adjust to limited range [16 240]
float Cbn = (Cb - 128) * (1.0f / 255.0f);
float Crn = (Cr - 128) * (1.0f / 255.0f);
const float YScale = 255.0f / (YMax-YMin);
const float UVScale = 255.0f / (UVMax-UVMin);
const
float BT709Mat[] = {
YScale, 0.000f, (UVScale * Er_minus_Ey_Range),
YScale, (-1.0f * UVScale * Eb_minus_Ey_Range * Kb_over_Kg), (-1.0f * UVScale * Er_minus_Ey_Range * Kr_over_Kg),
YScale, (UVScale * Eb_minus_Ey_Range), 0.000f,
};
// Matrix multiply operation
//
// rgb = BT709Mat * YCbCr
// Convert input Y, Cb, Cr to normalized float values
float Rn = (Yn * BT709Mat[0]) + (Cbn * BT709Mat[1]) + (Crn * BT709Mat[2]);
float Gn = (Yn * BT709Mat[3]) + (Cbn * BT709Mat[4]) + (Crn * BT709Mat[5]);
float Bn = (Yn * BT709Mat[6]) + (Cbn * BT709Mat[7]) + (Crn * BT709Mat[8]);
// Saturate normalzied linear (R G B) to range [0.0, 1.0]
Rn = saturatef(Rn);
Gn = saturatef(Gn);
Bn = saturatef(Bn);
// Gamma adjustment for RGB components after matrix transform
if (applyGammaMap) {
Rn = BT709_nonLinearNormToLinear(Rn);
Gn = BT709_nonLinearNormToLinear(Gn);
Bn = BT709_nonLinearNormToLinear(Bn);
}
*RPtr = Rn;
*GPtr = Gn;
*BPtr = Bn;
return 0;
}
I believe this logic is implemented correctly, but I am having a very difficult time validating the results. When I generate a .m4v file that contains gamma adjusted color values (osxcolor_test_image_24bit_BT709.m4v), the result come out as expected. But a test case like (bars_709_Frame01.m4v) that I found here does not seem to work as the color bar values seem to be encoded as linear (no gamma adjustment).
For a SMPTE test pattern, the 0.75 graylevel is linear RGB (191 191 191), should this RGB be encoded with no gamma adjustment as (Y Cb Cr) (180 128 128) or should the value in the bitstream appear as the gamma adjusted (Y Cb Cr) (206 128 128)?
(follow up)
After doing additional research into this gamma issue, it has become clear that what Apple is actually doing in AVFoundation is using a 1.961 gamma function. This is the case when encoding with AVAssetWriterInputPixelBufferAdaptor, when using vImage, or with CoreVideo APIs. This piecewise gamma function is defined as follows:
#define APPLE_GAMMA_196 (1.960938f)
static inline
float Apple196_nonLinearNormToLinear(float normV) {
const float xIntercept = 0.05583828f;
if (normV < xIntercept) {
normV *= (1.0f / 16.0f);
} else {
const float gamma = APPLE_GAMMA_196;
normV = pow(normV, gamma);
}
return normV;
}
static inline
float Apple196_linearNormToNonLinear(float normV) {
const float yIntercept = 0.00349f;
if (normV < yIntercept) {
normV *= 16.0f;
} else {
const float gamma = 1.0f / APPLE_GAMMA_196;
normV = pow(normV, gamma);
}
return normV;
}
Your original question: Does H.264 encoded video with BT.709 matrix include any gamma adjustment?
The encoded video only contains gamma adjustment - if you feed the encoder gamma adjusted values.
A H.264 encoder doesn't care about the transfer characteristics.
So if you compress linear and then decompress - you'll get linear.
So if you compress with gamma and then decompress - you'll get gamma.
Or if your bits are encoded with a Rec. 709 transfer function - the encoder won't change the gamma.
But you can specify the transfer characteristic in the H.264 stream as metadata. (Rec. ITU-T H.264 (04/2017) E.1.1 VUI parameters syntax). So the encoded streams carries the color space information around but it is not used in encoding or decoding.
I would assume that 8 bit video always contains a non linear transfer function. Otherwise you would use the 8 bit fairly unwisely.
If you convert to linear to do effects and composition - I'd recommend increasing the bit depth or linearizing into floats.
A color space consists of primaries, transfer function and matrix coefficients.
The gamma adjustment is encoded in the transfer function (and not in the matrix).

OpenCV 2.4.7 Mat::convertTo() 32-bit to 16-bit truncate

I've noticed that using convertTo to convert a matrix from 32-bit to 16-bit "rounds" number to the upper boud. So, values bigger than 0x0000FFFF in the source matrix will be set as 0xFFFF in the destination matrix.
What I want for my application is instead to mask the values, setting in the destination just the 2 LSB of the values.
Here is an example:
Mat mat32;
Mat mat16;
mat32 = Mat(2,2,CV_32SC1);
for(int y = 0; y < 2; y++)
for(int x = 0; x < 2; x++)
mat32.at<unsigned int>(cv::Point(x,y)) = 0x0000FFFE + (y*2+x);
mat32.convertTo(mat16, CV_16UC1);
The matrixes have these values:
32 bits matrix:
0000FFFE 0000FFFF
00010000 00010001
16 bits matrix:
0000FFFE 0000FFFF
0000FFFF 0000FFFF
In the second row of 16-bit matrix I want to have
00000000 00000001
I can do this by scanning the source matrix value-by-value and masking the values, but the performances are low.
Is there an OpenCV function that does this?
Thanks to everyone!
MIX
This can be done, but this requires a somewhat dirty trick, so it is up to you to use this approach or not. So this is how it can be done:
For this example lets create 1000x1000 32-bit matrix and set all its values to 65541 (=256*256+5). So after the conversion we expect to have a matrix filled with fives.
Mat M1(1000, 1000, CV_32S, Scalar(65541));
And here is the trick:
Mat M2(1000, 1000, CV_16SC2, M1.data);
We created matrix M2 over the same memory buffer as M1, but M2 'think' that this is a buffer of 2-channel 16-bit image. Now the last thing to do is to copy the channel you need to the place you need. This can be done by split() or mixChannels() functions. For example:
Mat M3(1000, 1000, CV_16S);
int fromto[] = {0,0};
Mat inpu[] = {M2}, outpu[] = {M3};
mixChannels(inpu, 1, outpu, 1, fromto, 1);
cout << M3.at<short>(10,10) << endl;
Ye I know that the format of mixChannels looks weird and makes the code even less readable, but it works... If you prefer split() function:
vector<Mat> v;
split(M2,v);
cout << v[0].at<short>(10,10) << " " << v[1].at<short>(10,10) << endl;
There is no OpenCV function (that I know of) which does the conversion like you want, so either you code it yourself or like you said you go through a masking step first to remove the 16 high bits.
The mask can be applied using the bitwise_and in C++ or cvAndS in C. See here.
You could also have made your hand-written code more efficient. In general, you should avoid OpenCV pixel accessors in loops because they have bad performance. I don't have an OpenCV install at hand so this could be slighlty off -- the idea is to use the data field directly, and step which is the number of bytes per row:
for(int y = 0; y < mat32.height; ++) {
int* row = (int*)( (char*)mat32.data + y * mat32.step);
for(int x = 0; x < mat32.step/ 4)
row[x] &= 0xffff;
Then, once the mask is applied, all values fit in 16 bits, and convertTo will just truncate the 16 upper bits.
The other solution is to code the conversion by hand:
mat16.resize( mat32.size() );
for(int y = 0; y < mat32.height; ++) {
const int* row32 = (const int*)( (char*)mat32.data + y * mat32.step);
short* row16 = (short*) ( (char*)mat16.data + y * mat16.step);
for(int x = 0; x < mat32.step/ 4)
row16[x] = short(row32[x]);

OpenCV Mat per-element operation: vector-matrix multiplication

I is an mxn matrix and each element of I is a 1x3 vector (I is a 3-channel Mat image actually).
M is a 3x3 matrix.
J is an matrix having the same dimension as I and is computed as follows: each element of J is the vector-matrix product of the corresponding (i.e. having the same coordinates) element of I and M.
I.e. if v1(r1,g1,b1) is an element of I and v2(r2,g2,b2) is its corresponding element of J, then v2 = v1 * M (this is a vector-matrix product, not a per-element product).
Question: How to compute J efficiently (in terms of speed)?
Thank you for your help.
As far as I know, the most efficient way to implement such an operation is as follows:
Reshape I from mxnx3 to (m·n)x3, let's call it I'
Calculate J' = I' * M
Reshape J' from (m·n)x3 to mxnx3, this is the J we wanted
The idea is to stack each pixel-wise operation pi'·M into one single operation P'·M, where P is the 3x(m·n) matrix containing each pixel in columns (hence P' holds one pixel per row. It's just a convention, really).
Here is a code sample written in c++:
// read some image
cv::Mat I = cv::imread("image.png"); // rows x cols x 3
// some matrix M, that modifies each pixel
cv::Mat M = (cv::Mat_<float>(3, 3) << 0, 0, 0,
0, .5, 0,
0, 0, .5); // 3 x 3
// remember old dimension
uint8_t prevChannels = I.channels;
uint32_t prevRows = I.rows;
// reshape I
uint32_t newRows = I.rows * I.cols;
I = I.reshape(1, newRows); // (rows * cols) x 3
// compute J
cv::Mat J = I * M; // (rows * cols) x 3
// reshape to original dimensions
J = J.reshape(prevChannels, prevRows); // rows x cols x 3
OpenCV provides an O(1) reshaping operation.
Thus performance depends solely on matrix multiplication, which I expect to be as efficient as possible in a computer vision library.
To further enhance performance, you might want to take a look at matrix multiplication using the ocl and gpu modules.

Resources