Converting YUV422 to RGB using GPU shader HLSL - directx

I'm considering to perform the color space conversion from YUV422 to RGB using HLSL. A four-byte YUYV will yield 2 three-byte RGB values, for example, Y1UY2V will give R1G1B1(left pixel) and R2G2B2(right pixel). Given texture coordinates in pixel shader increased gradiently, how could I differentiate between the texture coordinates for the left pixels i.e. all R1G1B1 and the texture coordinates for right pixels i.e. all R2G2B2. This way I could render all R1G1B1 and all R2G2B2 on a single texture instead of two.
Thanks!

Not sure what version of DirectX you use, but here is the version I use for dx11 (please note in that case I send yuv data in a StructuredBuffer, which saves me the fact of dealing with row stride. You can apply the same technique sending your yuv data as texture of course (with few little changes to the code below).
Here is the pixel shader code (I assume your render target is same size as your input image, and that you render a full screen quad/triangle).
StructuredBuffer<uint> yuy;
int w;
int h;
struct psInput
{
float4 p : SV_Position;
float2 uv : TEXCOORD0;
};
float4 PS(psInput input) : SV_Target
{
//Calculate pixel location within buffer (if you use texture change lookup here)
uint2 xy = input.p.xy;
uint p = (xy.x) + (xy.y * w);
uint pixloc = p / 2;
uint pixdata = yuy[pixloc];
//Since pixdata is packed, use some bitshift to remove non useful data
uint v = (pixdata & 0xff000000) >> 24;
uint y1 = (pixdata & 0xff0000) >> 16;
uint u = (pixdata & 0xff00) >> 8;
uint y0 = pixdata & 0x000000FF;
//Check if you are left/right pixel
uint y = p % 2 == 0 ? y0: y1;
//Convert yuv to rgb
float cb = u;
float cr = v;
float r = (y + 1.402 * (cr - 128.0));
float g = (y - 0.344 * (cb - 128.0) - 0.714 * (cr - 128));
float b = (y + 1.772 * (cb - 128));
return float4(r,g,b,1.0f) / 256.0f;
}
Hope that helps.

Related

How to reproduce Photoshop's multiply blending in OpenCV?

I'm trying to reproduce Photoshop's multiply blend mode in OpenCV. Equivalents to this would be what you find in GIMP, or when you use the CIMultiplyBlendMode in Apple's CoreImage framework.
Everything I read online suggests that multiply blending is accomplished simply by multiplying the channels of the two input images (i.e., Blend = AxB). And, this works, except for the case(s) where alpha is < 1.0.
You can test this very simply in GIMP/PhotoShop/CoreImage by creating two layers/images, filling each with a different solid color, and then modifying the opacity of the first layer. (BTW, when you modify alpha, the operation is no longer commutative in GIMP for some reason.)
A simple example: if A = (0,0,0,0) and B = (0.4,0,0,1.0), and C = AxB, then I would expect C to be (0,0,0,0). This is simple multiplication. But this is not how this blend is implemented in practice. In practice, C = (0.4,0,0,1.0), or C = B.
The bottom line is this: I need to figure out the formula for the multiply blend mode (which is clearly more than AxB) and then implement it in OpenCV (which should be trivial once I have the formula).
Would appreciate any insights.
Also, for reference, here are some links which show multiply blend as being simply AxB:
How does photoshop blend two images together
Wikipedia - Blend Modes
Photoshop Blend Modes
Here is an OpenCV solution based the source code of GIMP, specifically the function gimp_operation_multiply_mode_process_pixels.
NOTE
Instead of looping on all pixels it can be vectorized, but I followed the steps of GIMP.
Input images must be of type CV_8UC3 or CV_8UC4.
it supports also the opacity value, that must be in [0, 255]
in the original GIMP implementation there is also the support for a mask. It can be trivially added to the code, eventually.
This implementation is in fact not symmetrical, and reproduce your strange behaviour.
Code:
#include <opencv2\opencv.hpp>
using namespace cv;
Mat blend_multiply(const Mat& level1, const Mat& level2, uchar opacity)
{
CV_Assert(level1.size() == level2.size());
CV_Assert(level1.type() == level2.type());
CV_Assert(level1.channels() == level2.channels());
// Get 4 channel float images
Mat4f src1, src2;
if (level1.channels() == 3)
{
Mat4b tmp1, tmp2;
cvtColor(level1, tmp1, COLOR_BGR2BGRA);
cvtColor(level2, tmp2, COLOR_BGR2BGRA);
tmp1.convertTo(src1, CV_32F, 1. / 255.);
tmp2.convertTo(src2, CV_32F, 1. / 255.);
}
else
{
level1.convertTo(src1, CV_32F, 1. / 255.);
level2.convertTo(src2, CV_32F, 1. / 255.);
}
Mat4f dst(src1.rows, src1.cols, Vec4f(0., 0., 0., 0.));
// Loop on every pixel
float fopacity = opacity / 255.f;
float comp_alpha, new_alpha;
for (int r = 0; r < src1.rows; ++r)
{
for (int c = 0; c < src2.cols; ++c)
{
const Vec4f& v1 = src1(r, c);
const Vec4f& v2 = src2(r, c);
Vec4f& out = dst(r, c);
comp_alpha = min(v1[3], v2[3]) * fopacity;
new_alpha = v1[3] + (1.f - v1[3]) * comp_alpha;
if ((comp_alpha > 0.) && (new_alpha > 0.))
{
float ratio = comp_alpha / new_alpha;
out[0] = max(0.f, min(v1[0] * v2[0], 1.f)) * ratio + (v1[0] * (1.f - ratio));
out[1] = max(0.f, min(v1[1] * v2[1], 1.f)) * ratio + (v1[1] * (1.f - ratio));
out[2] = max(0.f, min(v1[2] * v2[2], 1.f)) * ratio + (v1[2] * (1.f - ratio));
}
else
{
out[0] = v1[0];
out[1] = v1[1];
out[2] = v1[2];
}
out[3] = v1[3];
}
}
Mat3b dst3b;
Mat4b dst4b;
dst.convertTo(dst4b, CV_8U, 255.);
cvtColor(dst4b, dst3b, COLOR_BGRA2BGR);
return dst3b;
}
int main()
{
Mat3b layer1 = imread("path_to_image_1");
Mat3b layer2 = imread("path_to_image_2");
Mat blend = blend_multiply(layer1, layer2, 255);
return 0;
}
I managed to sort this out. Feel free to comment with any suggested improvements.
First, I found a clue as to how to implement the multiply function in this post:
multiply blending
And here's a quick OpenCV implementation in C++.
Mat MultiplyBlend(const Mat& cvSource, const Mat& cvBackground) {
// assumption: cvSource and cvBackground are of type CV_8UC4
// formula: (cvSource.rgb * cvBackground.rgb * cvSource.a) + (cvBackground.rgb * (1-cvSource.a))
Mat cvAlpha(cvSource.size(), CV_8UC3, Scalar::all(0));
Mat input[] = { cvSource };
int from_to[] = { 3,0, 3,1, 3,2 };
mixChannels(input, 1, &cvAlpha, 1, from_to, 3);
Mat cvBackgroundCopy;
Mat cvSourceCopy;
cvtColor(cvSource, cvSourceCopy, CV_RGBA2RGB);
cvtColor(cvBackground, cvBackgroundCopy, CV_RGBA2RGB);
// A = cvSource.rgb * cvBackground.rgb * cvSource.a
Mat cvBlendResultLeft;
multiply(cvSourceCopy, cvBackgroundCopy, cvBlendResultLeft, 1.0 / 255.0);
multiply(cvBlendResultLeft, cvAlpha, cvBlendResultLeft, 1.0 / 255.0);
delete(cvSourceCopy);
// invert alpha
bitwise_not(cvAlpha, cvAlpha);
// B = cvBackground.rgb * (1-cvSource.a)
Mat cvBlendResultRight;
multiply(cvBackgroundCopy, cvAlpha, cvBlendResultRight, 1.0 / 255.0);
delete(cvBackgroundCopy, cvAlpha);
// A + B
Mat cvBlendResult;
add(cvBlendResultLeft, cvBlendResultRight, cvBlendResult);
delete(cvBlendResultLeft, cvBlendResultRight);
cvtColor(cvBlendResult, cvBlendResult, CV_RGB2RGBA);
return cvBlendResult;
}

Image processing: interpolation using intensity values of pixels in the input image

When we do image interpolation, I think we will use intensity values of pixels in the input image.
(A)
I am reading the code of cubic interpolation from GPU Gems Chapter 24. High-Quality Filtering. Here is a snippet of their code:
Example 24-9. Filtering Four Texel Rows, Then Filtering the Results as a Column
float4 texRECT_bicubic(uniform
samplerRECT tex,
uniform
samplerRECT kernelTex,
float2 t)
{
float2 f = frac(t); // we want the sub-texel portion
float4 t0 = cubicFilter(kernelTex, f.x,
texRECT(tex, t + float2(-1, -1)),
texRECT(tex, t + float2(0, -1)),
texRECT(tex, t + float2(1, -1)),
texRECT(tex, t + float2(2, -1)));
Since they get the sub-texel protion from frac(t), "t" is not exactly on pixel positions of the input image.
Then how come "t" is directly used to sample intensity values from the original images, like in "texRECT(tex, t + float2(-1, -1))"?
Personally, I think we should use
t - frac(t)
(B)
Same in an example from "Zoom An Image With Different Interpolation Types"
Their snippet of "GLSL shader code for Bi-Cubic Interpolation" is:
float a = fract( TexCoord.x * fWidth ); // get the decimal part
float b = fract( TexCoord.y * fHeight ); // get the decimal part
for( int m = -1; m <=2; m++ )
{
for( int n =-1; n<= 2; n++)
{
vec4 vecData = texture2D(textureSampler,
TexCoord + vec2(texelSizeX * float( m ),
texelSizeY * float( n )));
I think we should use:
TexCoord - vec2(a,b)
then use offset of m and n
(C) Now I am confused. I think we will use intensity values of "exact" pixels in the input image.
Which way should we use?

How do you map Kinect's depth data to its RGB color?

I'm working with a given dataset using OpenCV, without any Kinect by my side. And I would like to map the given depth data to its RGB counterpart (so that I can get the actual color and the depth)
Since I'm using OpenCV and C++, and don't own a Kinect, sadly I can't utilize MapDepthFrameToColorFrame method from the official Kinect API.
From the given cameras' intrinsics and distortion coefficients, I could map the depth to world coordinates, and back to RGB based on the algorithm provided here
Vec3f depthToW( int x, int y, float depth ){
Vec3f result;
result[0] = (float) (x - depthCX) * depth / depthFX;
result[1] = (float) (y - depthCY) * depth / depthFY;
result[2] = (float) depth;
return result;
}
Vec2i wToRGB( const Vec3f & point ) {
Mat p3d( point );
p3d = extRotation * p3d + extTranslation;
float x = p3d.at<float>(0, 0);
float y = p3d.at<float>(1, 0);
float z = p3d.at<float>(2, 0);
Vec2i result;
result[0] = (int) round( (x * rgbFX / z) + rgbCX );
result[1] = (int) round( (y * rgbFY / z) + rgbCY );
return result;
}
void map( Mat& rgb, Mat& depth ) {
/* intrinsics are focal points and centers of camera */
undistort( rgb, rgb, rgbIntrinsic, rgbDistortion );
undistort( depth, depth, depthIntrinsic, depthDistortion );
Mat color = Mat( depth.size(), CV_8UC3, Scalar(0) );
ushort * raw_image_ptr;
for( int y = 0; y < depth.rows; y++ ) {
raw_image_ptr = depth.ptr<ushort>( y );
for( int x = 0; x < depth.cols; x++ ) {
if( raw_image_ptr[x] >= 2047 || raw_image_ptr[x] <= 0 )
continue;
float depth_value = depthMeters[ raw_image_ptr[x] ];
Vec3f depth_coord = depthToW( y, x, depth_value );
Vec2i rgb_coord = wToRGB( depth_coord );
color.at<Vec3b>(y, x) = rgb.at<Vec3b>(rgb_coord[0], rgb_coord[1]);
}
}
But the result seems to be misaligned. I can't manually set the translations, since the dataset is obtained from 3 different Kinects, and each of them are misaligned in different direction. You could see one of it below (Left: undistorted RGB, Middle: undistorted Depth, Right: mapped RGB to Depth)
My question is, what should I do at this point? Did I miss a step while trying to project either depth to world or world back to RGB? Can anyone who has experienced with stereo camera point out my missteps?
I assume you would need to calibrate the depth sensor with the RGB data in the same way you would calibrate a stereo cameras. OpenCV has some functions (and tutorials) that you may be able to leverage.
A few other things that may be useful
http://www.ros.org/wiki/kinect_calibration/technical
https://github.com/robbeofficial/KinectCalib
http://www.mathworks.com/matlabcentral/linkexchange/links/2882-kinect-calibration-toolbox
This contains a paper on how to do it.
OpenCV has no functions for aligning depth stream to color video stream. But I know that there is special function named MapDepthFrameToColorFrame in "Kinect for Windows SDK".
I have no code for example, but hope that this is good point to start.
Upd:
Here is same example of mapping color image to depth using KinectSDK with interface to OpenCV (not my code).
It looks like you are not considering in your solution the extrinsics between both cameras.
Yes, you didn't consider the transformation between RGB and Depth.
But you can compute this matrix by using cvStereoCalibrate() method which just pass the image sequences of both RGB and Depth with checkerboard corners to it.
The detail you can find in OpecvCV documentation:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#double stereoCalibrate(InputArrayOfArrays objectPoints, InputArrayOfArrays imagePoints1, InputArrayOfArrays imagePoints2, InputOutputArray cameraMatrix1, InputOutputArray distCoeffs1, InputOutputArray cameraMatrix2, InputOutputArray distCoeffs2, Size imageSize, OutputArray R, OutputArray T, OutputArray E, OutputArray F, TermCriteria criteria, int flags)
And the whole method idea behind this is:
color uv <- color normalize <- color space <- DtoC transformation <- depth space <- depth normalize <- depth uv
(uc,vc) <- <- ExtrCol * (pc) <- stereo calibrate MAT <- ExtrDep^-1 * (pd) <- <(ud - cx)*d / fx, (vd-cy)*d/fy, d> <- (ud, vd)
If you want to add distortion to RGB, you just need to follow the step in:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html

HLSL Modify Tessellation shader to make equilateral triangles?

Details:
I'm in the proccess of procedural planet generation; so far I have done the dynamic LOD work, but my current software algorithm is very very slow. I decided to do it using DX11's new tessellation features instead.
Currently my sphere is a subdivided icosahedron. (20 sides all equilateral triangles)
Back when I was subdividing using my software algorithm, one triangle would be
split into four children across the midpoints of the parent forming the Hyrule symbol each time...like this: http://puu.sh/1xFIx
As you can see, each triangle subdivided created more and more equilateral triangles, i.e. each one was exactly the same shape.
But now that I am using the GPU to tessellate in HLSL, the result is definately not
what I am looking for: http://puu.sh/1xFx7
Questions:
Is there anything I can do in the Hull and Domain shaders to change the tessellation
so that it subdivides into sets of equilateral triangles like the first image?
Should I be using the geometry shader for something like this? If so, would it be
slower then the tessellator?
I tried using Tessellation Shader, but I encontred a problem: the domain shader only pass the uv coordinate (SV_DomainLocation) and the input patch for positionining the vertices, when the domain location for vertex is 0.3, 0.3, 0.3 (center vertex) is impossible to know the correct position because you need information about the other vertices or a index(x, y) of iteration that's not provided by the Domain Shader Stage.
because this problem I write the code in geometry shader, this shader is very limited for tessellations because the output stream cannot have a size bigger than 1024 bytes (in shader model 5.0). I implemented the calculation of vertex positions using the uv (like SV_DomainLocation) but this only tessellate the triangles, you must use part of your code to calculate added position in center of triangles to create the precise final result.
this is the code for equilateral triangles tessellation:
// required for array
#define MAX_ITERATIONS 5
void DrawTriangle(float4 p0, float4 p1, float4 p2, inout TriangleStream<VS_OUT> stream)
{
VS_OUT v0;
v0.pos = p0;
stream.Append(v0);
VS_OUT v1;
v1.pos = p1;
stream.Append(v1);
VS_OUT v2;
v2.pos = p2;
stream.Append(v2);
stream.RestartStrip();
}
[maxvertexcount(128)] // directx rule: maxvertexcount * sizeof(VS_OUT) <= 1024
void gs(triangle VS_OUT input[3], inout TriangleStream<VS_OUT> stream)
{
int itc = min(tess, MAX_ITERATIONS);
float fitc = itc;
float4 past_pos[MAX_ITERATIONS];
float4 array_pass[MAX_ITERATIONS];
for (int pi = 0; pi < MAX_ITERATIONS; pi++)
{
past_pos[pi] = float4(0, 0, 0, 0);
array_pass[pi] = float4(0, 0, 0, 0);
}
// -------------------------------------
// Tessellation kernel for the control points
for (int x = 0; x <= itc; x++)
{
float4 last;
for (int y = 0; y <= x; y++)
{
float2 seg = float2(x / fitc, y / fitc);
float3 uv;
uv.x = 1 - seg.x;
uv.z = seg.y;
uv.y = 1 - (uv.x + uv.z);
// ---------------------------------------
// Domain Stage
// uv Domain Location
// x,y IterationIndex
float4 fpos = input[0].pos * uv.x;
fpos += input[1].pos * uv.y;
fpos += input[2].pos * uv.z;
if (x > 0 && y > 0)
{
DrawTriangle(past_pos[y - 1], last, fpos, stream);
if (y < x)
{
// add adjacent triangle
DrawTriangle(past_pos[y - 1], fpos, past_pos[y], stream);
}
}
array_pass[y] = fpos;
last = fpos;
}
for (int i = 0; i < MAX_ITERATIONS; i++)
{
past_pos[i] = array_pass[i];
}
}
}

Dot Product and Luminance/ Findmyicone

All,
I have a basic question that I am struggling with here. When you look at the findmyicone sample code from WWDC 2010, you will see this:
static const uint8_t orangeColor[] = {255, 127, 0};
uint8_t referenceColor[3];
// Remove luminance
static inline void normalize( const uint8_t colorIn[], uint8_t colorOut[] ) {
// Dot product
int sum = 0;
for (int i = 0; i < 3; i++)
sum += colorIn[i] / 3;
for (int j = 0; j < 3; j++)
colorOut[j] = (float) ((colorIn[j] / (float) sum) * 255);
}
And then it is called:
normalize(orangeColor, referenceColor);
Running the debugger, it is converting BGRA: (Red 255, Green 127, Blue 0) to (Red 0, Green 255, Blue 0). I have looked on the web and SO to find details on luminance and dot product and there is really no information.
1- Can someone guide me on what this function is doing?
2- Can you guide me to some helpful topics/primer online as well?
Thanks again
KMB
What they're trying to do is track a particular color across variations in brightness, so they're normalizing for the luminance of the color. I do something similar in the fragment shader I use in a color tracking example based on a GPU Gems paper from Apple, as well as the ColorObjectTracking sample application in my GPUImage framework:
vec3 normalizeColor(vec3 color)
{
return color / max(dot(color, vec3(1.0/3.0)), 0.3);
}
vec4 maskPixel(vec3 pixelColor, vec3 maskColor)
{
float d;
vec4 calculatedColor;
// Compute distance between current pixel color and reference color
d = distance(normalizeColor(pixelColor), normalizeColor(maskColor));
// If color difference is larger than threshold, return black.
calculatedColor = (d > threshold) ? vec4(0.0) : vec4(1.0);
//Multiply color by texture
return calculatedColor;
}
The above calculation takes the average of the three color components by multiplying each channel by 1/3 and then summing them (that's what the dot product does here). It then divides each color channel by this average to arrive at a normalized color.
The distance between this normalized color and the target one is calculated, and if it is within a certain threshold the pixel is marked as being of that color.
This is just one way of determining proximity of one color to another. Another way is to convert the RGB values into Y, Cr, and Cb (Y, U, and V) components and then take the distance between just the chrominance portions (Cr and Cb):
vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
vec4 textureColor2 = texture2D(inputImageTexture2, textureCoordinate2);
float maskY = 0.2989 * colorToReplace.r + 0.5866 * colorToReplace.g + 0.1145 * colorToReplace.b;
float maskCr = 0.7132 * (colorToReplace.r - maskY);
float maskCb = 0.5647 * (colorToReplace.b - maskY);
float Y = 0.2989 * textureColor.r + 0.5866 * textureColor.g + 0.1145 * textureColor.b;
float Cr = 0.7132 * (textureColor.r - Y);
float Cb = 0.5647 * (textureColor.b - Y);
float blendValue = 1.0 - smoothstep(thresholdSensitivity, thresholdSensitivity + smoothing, distance(vec2(Cr, Cb), vec2(maskCr, maskCb)));
This code is what I use in a chroma keying shader, and it's based on a similar calculation that Apple uses in one of their sample applications. Which one is best can depend on the particular situation you're facing.

Resources