How to undistort I420 image data? Efficiently - opencv

I am able to undistort RGB image successfully.
Now, I am working on directly undistort I420 data, instead of first converting it to RGB.
Below are the steps I followed after camera calibration.
K = cv::Matx33d(541.2152931632737, 0.0, 661.7479652584254,
0.0, 541.0606969363056, 317.4524205037745,
0.0, 0.0, 1.0);
D = cv::Vec4d(-0.042166406281296365, -0.001223961942208027, -0.0017036710622692108, 0.00023929900459453295);
newSize = cv::Size(3400, 1940);
cv::Matx33d new_K;
cv::fisheye::estimateNewCameraMatrixForUndistortRectify(K, D, cv::Size(W, H), cv::Mat::eye(3, 3, CV_64F), new_K, 1, newSize); // W,H are the distorted image size
cv::fisheye::initUndistortRectifyMap(K, D, cv::Mat::eye(3, 3, CV_64F), new_K, newSize, CV_16SC2, mapx, mapy);
cv::remap(src, dst, mapx, mapy, cv::INTER_LINEAR);
Above code is giving me undistorted image successfully.
Now I want to undistort I420 data. So, now my src will be an I420/YV12 data.
How can I undistort an I420 data, without converting it first to RGB?
By the way
I420 is an image format with only 1 channel(unlike 3 channels in RGB). It has height = 1.5*image height. Its width is equal to image width.
Below code is to convert I420 to BGR
cvtColor(src, BGR, CV_YUV2BGR_I420, 3);
BGR - pixel arrangement
I420 - pixel arrangement

The most efficient solution is resizing mapx and mapy and applying shrunk maps on down-sampled U and V channels:
Shrink mapx and mapy by a factor of x2 in each axis - create smaller maps matrices.
Divide all elements of shrank maps by 2 (applies mapping lower resolution image).
Apply mapx and mapy on Y color channel.
Apply shrunk_mapx and shrunk_mapy on down-sampled U and V color channels.
Here is a Python OpenCV sample code (please read the comments):
import cv2 as cv
import numpy as np
# For the example, read Y, U and V as separate images.
srcY = cv.imread('DistortedChessBoardY.png', cv.IMREAD_GRAYSCALE) # Y color channel (1280x720)
srcU = cv.imread('DistortedChessBoardU.png', cv.IMREAD_GRAYSCALE) # U color channel (640x360)
srcV = cv.imread('DistortedChessBoardV.png', cv.IMREAD_GRAYSCALE) # V color channel (640x360)
H, W = srcY.shape[0], srcY.shape[1]
K = np.array([[541.2152931632737, 0.0, 661.7479652584254],
[0.0, 541.0606969363056, 317.4524205037745],
[0.0, 0.0, 1.0]])
D = np.array([-0.042166406281296365, -0.001223961942208027, -0.0017036710622692108, 0.00023929900459453295])
# newSize = cv::Size(3400, 1940);
newSize = (850, 480)
# cv::Matx33d new_K;
new_K = np.eye(3)
# cv::fisheye::estimateNewCameraMatrixForUndistortRectify(K, D, cv::Size(W, H), cv::Mat::eye(3, 3, CV_64F), new_K, 1, newSize); // W,H are the distorted image size
new_K = cv.fisheye.estimateNewCameraMatrixForUndistortRectify(K, D, (W, H), np.eye(3), new_K, 1, newSize)
# cv::fisheye::initUndistortRectifyMap(K, D, cv::Mat::eye(3, 3, CV_64F), new_K, newSize, CV_16SC2, mapx, mapy);
mapx, mapy = cv.fisheye.initUndistortRectifyMap(K, D, np.eye(3), new_K, newSize, cv.CV_16SC2);
# cv::remap(src, dst, mapx, mapy, cv::INTER_LINEAR);
dstY = cv.remap(srcY, mapx, mapy, cv.INTER_LINEAR)
# Resize mapx and mapy by a factor of x2 in each axis, and divide each element in the map by 2
shrank_mapSize = (mapx.shape[1]//2, mapx.shape[0]//2)
shrunk_mapx = cv.resize(mapx, shrank_mapSize, interpolation = cv.INTER_LINEAR) // 2
shrunk_mapy = cv.resize(mapy, shrank_mapSize, interpolation = cv.INTER_LINEAR) // 2
# Remap U and V using shunk maps
dstU = cv.remap(srcU, shrunk_mapx, shrunk_mapy, cv.INTER_LINEAR, borderValue=128)
dstV = cv.remap(srcV, shrunk_mapx, shrunk_mapy, cv.INTER_LINEAR, borderValue=128)
cv.imshow('dstY', dstY)
cv.imshow('dstU', dstU)
cv.imshow('dstV', dstV)
cv.waitKey(0)
cv.destroyAllWindows()
Result:
Y:
U:
V:
After converting to RGB:
C++ implementation considerations:
Since I420 format arranges Y, U and V as 3 continuous planes in memory, it's simple to set a pointer to each "plane", and treat it as a Grayscale image.
Same data ordering applies the output image - set 3 pointer to output "planes".
Illustration (assuming even width and height, and assume byte stride equals width):
srcY -> YYYYYYYY dstY -> YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY YYYYYYYYYYYY
YYYYYYYY remap YYYYYYYYYYYY
YYYYYYYY ======> YYYYYYYYYYYY
srcU -> UUUU YYYYYYYYYYYY
UUUU dstU -> YYYYYYYYYYYY
UUUU UUUUUU
srcV -> VVVV UUUUUU
VVVV UUUUUU
VVVV UUUUUU
dstV -> VVVVVV
VVVVVV
VVVVVV
VVVVVV
Implementing above illustration is C++
Under the assumption that width and height are even, and byte stride equals width, you can use the following C++ example for converting I420 to Y, U and V planes:
Assume: srcI420 is Wx(H*3/2) matrix in I420 format, like cv::Mat srcI420(cv::Size(W, H * 3 / 2), CV_8UC1);.
int W = 1280, H = 720; //Assume resolution of Y plane is 1280x720
//Pointer to Y plane
unsigned char *pY = (unsigned char*)srcI420.data;
//Y plane as cv::Mat, resolution of srcY is 1280x720
cv::Mat srcY = cv::Mat(cv::Size(W, H), CV_8UC1, (void*)pY);
//U plane as cv::Mat, resolution of srcU is 640x360 (in memory buffer, U plane is placed after Y).
cv::Mat srcU = cv::Mat(cv::Size(W/2, H/2), CV_8UC1, (void*)(pY + W*H));
//V plane as cv::Mat, resolution of srcV is 640x360 (in memory buffer, V plane is placed after U).
cv::Mat srcV = cv::Mat(cv::Size(W / 2, H / 2), CV_8UC1, (void*)(pY + W*H + (W/2*H/2)));
//Display srcY, srcU, srcV for testing
cv::imshow("srcY", srcY);
cv::imshow("srcU", srcU);
cv::imshow("srcV", srcV);
cv::waitKey(0);
Above example uses pointer manipulations, without the need for copying the data.
You can use the same pointer manipulations for your destination I420 image.
Note: The solution is going to work in most cases, but not guaranteed to work in all cases.

EDIT: Components are not interleaved in the YV12 format, so the following will not work:
If the YV12 data is a one channel image, the interpolation of the remap operation is applied to the value represented by all three YUV data instead of individual Y, U and V components.
Therefore, roughly speaking, instead of doing a
c.YYYYYYYY, c.UU, c.VV
it will perform a
c.YYYYYYYYUUVV
during a linear interpolation.
You can perform a YV12 -> BGR color conversion after remap, but the colors of the interpolated pixels would be wrong.
Instead of doing a linear interpolation, try using a nearest-neighbor interpolation in remap. Then you should be able to get correct colors after YV12 -> BGR color conversion.
So, find mapx, mapy, then remap using INTER_NEAREST, and finally perform a YV12 -> BGR color conversion.

Related

Change the background of rotated image to White instead of black after using Projection Profile Deskew method in OpenCV (or skimage)

I have used Projection Profile method on my binary images to get the deskew version. Everything is fine but the rotated image is having Black areas where the deskewing has been applied. How can I convert that area to white instead of black. Below is the code for Projection Profile.
def correct_skew(image, delta=1, limit=5):
"""
image : input
delta : sampling in the -limit,limit + delta range
limit : range of angles to explore
"""
# Function that returns the score of histogram for the given angle at which we check
def determine_score(arr, angle):
"""
arr : binarized image
angle : angle at which we calcuate the score
"""
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC)
return best_angle, rotated
This is the Image after Deskewing:
Original Binary Image:
The cv2.warpaffine documentation states that the function takes an optional argument namely borderValue. By default this value is (0, 0, 0), You can change this by calling your warpaffine routine as:
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode = cv2.BORDER_CONSTANT, borderValue=np.array([255, 255, 255]))

How to draw rectangle on a YUV420p frame

I would like to draw a red rectangle using the following function on a YUV420P frame. Following code alters the frame and I can see two black line(top and bottom) remaining black dots scattered. Any suggestions?
void draw_rectangle(uint8_t *frame, int x, int y,
int width, int height,
int img_width, int img_height)
{
cv::Mat frame_yuv;
int size[2];
Point pt1, pt2;
cv::Scalar color = Scalar(255, 0, 0);
size[0] = img_width + img_width/2;
size[1] = img_height;
frame_yuv = cv::Mat(2, size, CV_8UC1, frame);
pt1.x = x;
pt1.y = y;
pt2.x = x + width;
pt2.y = y + height;
rectangle(frame_yuv, pt1, pt2, Scalar(0, 0, 255));
}
Finally, I got my code working. Steps are given below for reference.
frame_yuv = cv::Mat(2, size, CV_8UC3, frame);
cv::Mat C(2,2, CV_8UC3, color);
cv::Mat C_yuv;
cvtColor(C, C_yuv, cv::COLOR_BGR2YUV_I420);
// Set the R, G, B values to C_yuv
// Extract the Y, U, V components to separate Mat's
// Apply rectange first on Y component
// Devide each points pt1, pt2 by 2
// Apply the rectange on U, V
No extra copy of the frame is done.
As you haven't provided any sample data, please use the file kindly provided by #zindarod with dimensions 144x176.
Here is how the YUV data look in memory:
Notice in the stream along the bottom... all the Y pixels come first. Then all the U pixels but downsampled by a factor of 4. Then all the V pixels, also downsampled by a factor of 4.
I haven't got time to write the code in OpenCV, but I can show you how to make a regular Mat out of it.
Step 1 - Extract Y channel
Take the first 144x176 bytes and put them into an 144x176 8UC1 Mat called Y.
Step 2 - Extract U channel
Skip the first 144x176 bytes and then take the next 72x88 bytes and put them into another 72x88 8UC1 Mat called U. Resize this Mat to double the width and double the height, i.e. 144x176.
Step 3 - Extract the V channel
Skip the first (144x176) + (88x72) bytes and then take the next 72x88 bytes and put them into another 72x88 8UC1 Mat called V. Resize this Mat to double the width and double the height, i.e. 144x176.
Step 4 - Merge
Take the Y, U, and V Mats and merge them into an 8UC3 Mat:
// Now merge the 3 individual channels into 3-band bad boy
auto channels = std::vector<cv::Mat>{Y, U, V};
cv::Mat ThreeBandBoy;
cv::merge(channels, ThreeBandBoy);
There is some code here that does more or less exactly what is needed for Steps 1-3.
I am reading this YUV image from file, which is YUV_I420.
fstream file;
file.open("yuv_i420.yuv", ios::in | ios::binary);
// size of image in RGB
size_t rows = 144, cols = 176;
if (!file.is_open())
stderr<<"Error opening file"<<endl;
else {
// get total size of file
auto size = file.tellg();
file.seekg(0,ios::end);
size = file.tellg() - size;
file.seekg(0,ios::beg);
char *buffer = new char[size];
// read image from file
if (file.read(buffer, size)) {
// create YUV Mat
Mat yuv_I420(rows + rows / 2, cols, CV_8UC1, buffer);
// draw a rectangle on YUV image, keep in mind that the YUV image is a
// single channel grayscale image, size is different than the BGR image
rectangle(yuv_I420, Point(10, 10), Point(50, 50), Scalar(255));
// convert to BGR to check validity
Mat bgr;
cvtColor(yuv_I420, bgr, cv::COLOR_YUV2BGR_I420);
cv::imshow("image", bgr);
cv::waitKey(0);
}
file.close();
delete[] buffer;
}

How to extract these 6 symbols (signatures) from paper (opencv)

I have an image:
and I'm trying extract the signs one by one.
I tried findContours() but I got a lot of internal contours. Is there any way to do this?
While finding contours always ensure that the regions of interest are in white. In this case, after converting the image to grayscale, apply an inverted binary threshold such that the signatures are in white. After doing so findContours() will easily find all the signatures.
Code:
The following implementation is in python:
import cv2
image = cv2.imread(r'C:\Users\Jackson\Desktop\sign.jpg')
#--- Image was too big hence I resized it ---
image = cv2.resize(image, (0, 0), fx = 0.5, fy = 0.5)
#--- Converting image to grayscale ---
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
#--- Performing inverted binary threshold ---
retval, thresh_gray = cv2.threshold(gray, 0, 255, type = cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
cv2.imshow('sign_thresh_gray', thresh_gray)
#--- finding contours ---
image, contours, hierarchy = cv2.findContours(thresh_gray,cv2.RETR_EXTERNAL, \
cv2.CHAIN_APPROX_SIMPLE)
for i, c in enumerate(contours):
if cv2.contourArea(c) > 100:
x, y, w, h = cv2.boundingRect(c)
roi = image[y :y + h, x : x + w ]
cv2.imshow('sign_{}.jpg'.format(i), roi)
cv2.waitKey()
cv2.destroyAllWindows()
Result:
Here I have some of the extracted signatures.

Convert YUV4:4:4 to YUV4:2:2 images

There is a lot of information on the internet about the differences between YUV4:4:4 to YUV4:2:2 formats, however, I can not find anything that tells how to convert the YUV4:4:4 to YUV4:2:2. Since such conversion is performed using software, I was hoping that there should be some developers that have done it and could direct me to the sources that describe the conversion algorithm. Of course, the software code would be nice to have, but having the access to the theory would be sufficient enough to write my own software. Specifically, I would like to know pixel structure and how the bytes are managed during conversion.
I found several similar questions like this and this, however, could not get my question answered. Also, I posted this question on the Photography forum, and they considered it as a software question.
The reason why you can't find specific description, is that there are many ways to do it.
Lets start from Wikipedia: https://en.wikipedia.org/wiki/Chroma_subsampling#4:2:2
4:4:4:
Each of the three Y'CbCr components have the same sample rate, thus there is no chroma subsampling. This scheme is sometimes used in high-end film scanners and cinematic post production.
and
4:2:2:
The two chroma components are sampled at half the sample rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third with little to no visual difference.
Note: Terms YCbCr and YUV are used interchangeably.
https://en.wikipedia.org/wiki/YCbCr
Y′CbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used interchangeably, leading to some confusion; when referring to signals in video or digital form, the term "YUV" mostly means "Y′CbCr".
Data memory ordering:
Again there is more than one format.
Intel IPP documentation defines two main categories: "Pixel-Order Image Formats" and "Planar Image Formats".
There is a nice documentation here: https://software.intel.com/en-us/node/503876
Refer here: http://www.fourcc.org/yuv.php#NV12 for YUV pixel arrangement formats.
Refer here: http://scc.ustc.edu.cn/zlsc/sugon/intel/ipp/ipp_manual/IPPI/ippi_ch6/ch6_image_downsampling.htm#ch6_image_downsampling for downsampling description.
Let's assume "Pixel-Order" format:
YUV 4:4:4 data order: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2 Y3 U3 V3
YUV 4:2:2 data order: Y0 U0 Y1 V0 Y2 U1 Y3 V1
Each element is a single byte, and Y0 is the lower byte in memory.
The 4:2:2 data order described above is named UYVY or YUY2 pixel format.
Conversion algorithms:
"Naive sub-sampling":
"Throw" every second U/V component:
Take U0, and throw U1, take V0 and throw V1...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 U0 Y1 V0 Y2 U2 Y3 V2
I can't recommend it, since it causes aliasing artifacts.
Average each U/V pair:
Take Destination U0 equals source (U0+U1)/2, same for V0...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 (U0+U1)/2 Y1 (V0+V1)/2 Y2 (U2+U3)/2 Y3 (V2+V3)/2
Use other interpolation method for down-sampling U and V (cubic interpolation for example).
Usually you will not be able to see any differences compared to simple average.
C implementation:
The question is not tagged as C, but I think the following C implementation may be helpful.
The following code converts pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2 by averaging each U/V pair:
//Convert single row I0 from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//Save the result in J0.
//I0 size in bytes is image_width*3
//J0 size in bytes is image_width*2
static void ConvertRowYUV444ToYUV422(const unsigned char I0[],
const int image_width,
unsigned char J0[])
{
int x;
//Process two Y,U,V triples per iteration:
for (x = 0; x < image_width; x += 2)
{
//Load source elements
unsigned char y0 = I0[x*3]; //Load source Y element
unsigned int u0 = (unsigned int)I0[x*3+1]; //Load source U element (and convert from uint8 to uint32).
unsigned int v0 = (unsigned int)I0[x*3+2]; //Load source V element (and convert from uint8 to uint32).
//Load next source elements
unsigned char y1 = I0[x*3+3]; //Load source Y element
unsigned int u1 = (unsigned int)I0[x*3+4]; //Load source U element (and convert from uint8 to uint32).
unsigned int v1 = (unsigned int)I0[x*3+5]; //Load source V element (and convert from uint8 to uint32).
//Calculate destination U, and V elements.
//Use shift right by 1 for dividing by 2.
//Use plus 1 before shifting - round operation instead of floor operation.
unsigned int u01 = (u0 + u1 + 1) >> 1; //Destination U element equals average of two source U elements.
unsigned int v01 = (v0 + v1 + 1) >> 1; //Destination U element equals average of two source U elements.
J0[x*2] = y0; //Store Y element (unmodified).
J0[x*2+1] = (unsigned char)u01; //Store destination U element (and cast uint32 to uint8).
J0[x*2+2] = y1; //Store Y element (unmodified).
J0[x*2+3] = (unsigned char)v01; //Store destination V element (and cast uint32 to uint8).
}
}
//Convert image I from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//I - Input image in pixel-order data YUV 4:4:4 format.
//image_width - Number of columns of image I.
//image_height - Number of rows of image I.
//J - Destination "image" in pixel-order data YUV 4:2:2 format.
//Note: The term "YUV" referees to "Y'CbCr".
//I is pixel ordered YUV 4:4:4 format (size in bytes is image_width*image_height*3):
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//
//J is pixel ordered YUV 4:2:2 format (size in bytes is image_width*image_height*2):
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//
//Conversion algorithm:
//Each element of destination U is average of 2 original U horizontal elements
//Each element of destination V is average of 2 original V horizontal elements
//
//Limitations:
//1. image_width must be a multiple of 2.
//2. I and J must be two separate arrays (in place computation is not supported).
static void ConvertYUV444ToYUV422(const unsigned char I[],
const int image_width,
const int image_height,
unsigned char J[])
{
//I0 points source row.
const unsigned char *I0; //I0 -> YUYVYUYV...
//J0 and points destination row.
unsigned char *J0; //J0 -> YUYVYUYV
int y; //Row index
//In each iteration process single row.
for (y = 0; y < image_height; y++)
{
I0 = &I[y*image_width*3]; //Input row width is image_width*3 bytes (each pixel is Y,U,V).
J0 = &J[y*image_width*2]; //Output row width is image_width*2 bytes (each two pixels are Y,U,Y,V).
//Process single source row into single destination row
ConvertRowYUV444ToYUV422(I0, image_width, J0);
}
}
Planar representation of YUV 4:2:2
Planar representation may be more intuitive than "Pixel-Order" format.
In planar representation each color channel is represented as a separate matrix, which can be displayed as an image.
Example:
Original image in RGB format (before converting to YUV):
Image channels in YUV 4:4:4 format:
(Left YUV triple is represented in gray levels, and right YUV triple is represented using false colors).
Image channels in YUV 4:2:2 format (after horizontal Chroma subsampling):
(Left YUV triple is represented in gray levels, and right YUV triple is represented using "false colors").
As you can see, in 4:2:2 format, the U an V channels are down-sampled (shrunk) in the horizontal axis.
Remark:
The "false colors" representation of U and V channels is used for emphasizing that Y is the Luma channel and U and V are the Chrominance channels.
Higher order interpolation and Anti-Aliasing filter:
Following MATLAB code sample shows how to perform down-sampling with higher order interpolation and Anti-Aliasing filter.
The sample also shows the down-sampling method used by FFMPEG.
Note: you don't need to know MATLAB programming in order to understand the samples.
You do need some knowledge of image filtering by convolution between a Kernel and an image.
%Prepare the input:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load('mandrill.mat', 'X', 'map'); %Load input image
RGB = im2uint8(ind2rgb(X, map)); %Convert to RGB (the mandrill sample image is an indexed image)
YUV = rgb2ycbcr(RGB); %Convert from RGB to YUV (MATLAB function rgb2ycbcr uses BT.601 conversion formula)
%Separate YUV to 3 planes (Y plane, U plane and V plane)
Y = YUV(:, :, 1);
U = YUV(:, :, 2);
V = YUV(:, :, 3);
U = double(U); %Work in double precision instead of uint8.
[M, N] = size(Y); %Image size is N columns by M rows.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation (without Anti-Aliasing filter).
%Simple averaging is equivalent to linear interpolation.
U2 = (U(:, 1:2:end) + U(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', false); %Use MATLAB imresize function as reference
disp(['Linear interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation (without Anti-Aliasing filter).
%Following operations are equivalent to cubic interpolation:
%1. Convolution with filter kernel [-0.125, 1.25, -0.125]
%2. Averaging pair elements
fU = imfilter(U, [-0.125, 1.25, -0.125], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (valid range of U elements in uint8 format is [16, 240])
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', false); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'bilinear' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [0.25, 0.5, 0.25]
%2. Averaging pair elements
fU = imfilter(U, [0.25, 0.5, 0.25], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', true); %Use MATLAB imresize function as reference
disp(['Linear interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'cubic' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375]
%2. Averaging pair elements
h = [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375];
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', true); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG implementation of horizontal down-sampling U plane.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG uses cubic interpolation with Anti-Aliasing filter (different filter kernel):
%Remark: I didn't check the source code of FFMPEG to verify the values of the filter kernel.
%I can't tell how FFMPEG actually implements the conversion.
%Following operations are equivalent to FFMPEG implementation (with minor differences):
%1. Convolution with filter kernel [-115, -231, 1217, 2354, 1217, -231, -115]/4096
%2. Averaging pair elements
h = [-115, -231, 1217, 2354, 1217, -231, -115]/4096;
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (FFMPEG actually doesn't limit the result)
%Save Y,U,V planes to file in format supported by FFMPEG
f = fopen('yuv444.yuv', 'w');
fwrite(f, Y', 'uint8');
fwrite(f, U', 'uint8');
fwrite(f, V', 'uint8');
fclose(f);
%For executing FFMPEG within MATLAB, download FFMPEG and place the executable in working directory (ffmpeg.exe for Windows)
%FFMPEG converts source file in YUV444 format to destination file in YUV422 format.
if isunix
[status, cmdout] = system(['./ffmpeg -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
else
[status, cmdout] = system(['ffmpeg.exe -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
end
f = fopen('yuv422.yuv', 'r');
refY = (fread(f, [N, M], '*uint8'))';
refU2 = (fread(f, [N/2, M], '*uint8'))'; %Read down-sampled U plane (FFMPEG result from file).
refV2 = (fread(f, [N/2, M], '*uint8'))';
fclose(f);
%Limit to valid range of U elements.
%In FFMPEG down-sampled U and V may exceed valid range (there is probably a way to tell FFMPEG to limit the result).
refU2 = max(min(refU2, 240), 16);
%Difference exclude first column and last column (FFMPEG treats the margins different than MATLAB)
%Remark: There are minor differences due to rounding (I guess).
disp(['FFMPEG Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(max(abs(double(U2(:, 2:end-1)) - double(refU2(:, 2:end-1))))))]);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Examples for different kind of down-sampling methods.
Linear interpolation versus Cubic interpolation with Anti-Aliasing filter:
In the first example (mandrill) there are no visible differences.
In the second example (circle and rectangle) there are minor visible differences.
The third example (lines) demonstrates aliasing artifacts.
Remark: displayed images where up-sampled from YUV422 to YUV444 using Cubic interpolation and converted from YUV444 to RGB.
Linear interpolation versus Cubic with Anti-Aliasing (mandrill):
Linear interpolation versus Cubic with Anti-Aliasing (circle and rectangle):
Linear interpolation versus Cubic with Anti-Aliasing (demonstrates Aliasing artifacts):

How do you map Kinect's depth data to its RGB color?

I'm working with a given dataset using OpenCV, without any Kinect by my side. And I would like to map the given depth data to its RGB counterpart (so that I can get the actual color and the depth)
Since I'm using OpenCV and C++, and don't own a Kinect, sadly I can't utilize MapDepthFrameToColorFrame method from the official Kinect API.
From the given cameras' intrinsics and distortion coefficients, I could map the depth to world coordinates, and back to RGB based on the algorithm provided here
Vec3f depthToW( int x, int y, float depth ){
Vec3f result;
result[0] = (float) (x - depthCX) * depth / depthFX;
result[1] = (float) (y - depthCY) * depth / depthFY;
result[2] = (float) depth;
return result;
}
Vec2i wToRGB( const Vec3f & point ) {
Mat p3d( point );
p3d = extRotation * p3d + extTranslation;
float x = p3d.at<float>(0, 0);
float y = p3d.at<float>(1, 0);
float z = p3d.at<float>(2, 0);
Vec2i result;
result[0] = (int) round( (x * rgbFX / z) + rgbCX );
result[1] = (int) round( (y * rgbFY / z) + rgbCY );
return result;
}
void map( Mat& rgb, Mat& depth ) {
/* intrinsics are focal points and centers of camera */
undistort( rgb, rgb, rgbIntrinsic, rgbDistortion );
undistort( depth, depth, depthIntrinsic, depthDistortion );
Mat color = Mat( depth.size(), CV_8UC3, Scalar(0) );
ushort * raw_image_ptr;
for( int y = 0; y < depth.rows; y++ ) {
raw_image_ptr = depth.ptr<ushort>( y );
for( int x = 0; x < depth.cols; x++ ) {
if( raw_image_ptr[x] >= 2047 || raw_image_ptr[x] <= 0 )
continue;
float depth_value = depthMeters[ raw_image_ptr[x] ];
Vec3f depth_coord = depthToW( y, x, depth_value );
Vec2i rgb_coord = wToRGB( depth_coord );
color.at<Vec3b>(y, x) = rgb.at<Vec3b>(rgb_coord[0], rgb_coord[1]);
}
}
But the result seems to be misaligned. I can't manually set the translations, since the dataset is obtained from 3 different Kinects, and each of them are misaligned in different direction. You could see one of it below (Left: undistorted RGB, Middle: undistorted Depth, Right: mapped RGB to Depth)
My question is, what should I do at this point? Did I miss a step while trying to project either depth to world or world back to RGB? Can anyone who has experienced with stereo camera point out my missteps?
I assume you would need to calibrate the depth sensor with the RGB data in the same way you would calibrate a stereo cameras. OpenCV has some functions (and tutorials) that you may be able to leverage.
A few other things that may be useful
http://www.ros.org/wiki/kinect_calibration/technical
https://github.com/robbeofficial/KinectCalib
http://www.mathworks.com/matlabcentral/linkexchange/links/2882-kinect-calibration-toolbox
This contains a paper on how to do it.
OpenCV has no functions for aligning depth stream to color video stream. But I know that there is special function named MapDepthFrameToColorFrame in "Kinect for Windows SDK".
I have no code for example, but hope that this is good point to start.
Upd:
Here is same example of mapping color image to depth using KinectSDK with interface to OpenCV (not my code).
It looks like you are not considering in your solution the extrinsics between both cameras.
Yes, you didn't consider the transformation between RGB and Depth.
But you can compute this matrix by using cvStereoCalibrate() method which just pass the image sequences of both RGB and Depth with checkerboard corners to it.
The detail you can find in OpecvCV documentation:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#double stereoCalibrate(InputArrayOfArrays objectPoints, InputArrayOfArrays imagePoints1, InputArrayOfArrays imagePoints2, InputOutputArray cameraMatrix1, InputOutputArray distCoeffs1, InputOutputArray cameraMatrix2, InputOutputArray distCoeffs2, Size imageSize, OutputArray R, OutputArray T, OutputArray E, OutputArray F, TermCriteria criteria, int flags)
And the whole method idea behind this is:
color uv <- color normalize <- color space <- DtoC transformation <- depth space <- depth normalize <- depth uv
(uc,vc) <- <- ExtrCol * (pc) <- stereo calibrate MAT <- ExtrDep^-1 * (pd) <- <(ud - cx)*d / fx, (vd-cy)*d/fy, d> <- (ud, vd)
If you want to add distortion to RGB, you just need to follow the step in:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html

Resources