Image processing: interpolation using intensity values of pixels in the input image - image-processing

When we do image interpolation, I think we will use intensity values of pixels in the input image.
(A)
I am reading the code of cubic interpolation from GPU Gems Chapter 24. High-Quality Filtering. Here is a snippet of their code:
Example 24-9. Filtering Four Texel Rows, Then Filtering the Results as a Column
float4 texRECT_bicubic(uniform
samplerRECT tex,
uniform
samplerRECT kernelTex,
float2 t)
{
float2 f = frac(t); // we want the sub-texel portion
float4 t0 = cubicFilter(kernelTex, f.x,
texRECT(tex, t + float2(-1, -1)),
texRECT(tex, t + float2(0, -1)),
texRECT(tex, t + float2(1, -1)),
texRECT(tex, t + float2(2, -1)));
Since they get the sub-texel protion from frac(t), "t" is not exactly on pixel positions of the input image.
Then how come "t" is directly used to sample intensity values from the original images, like in "texRECT(tex, t + float2(-1, -1))"?
Personally, I think we should use
t - frac(t)
(B)
Same in an example from "Zoom An Image With Different Interpolation Types"
Their snippet of "GLSL shader code for Bi-Cubic Interpolation" is:
float a = fract( TexCoord.x * fWidth ); // get the decimal part
float b = fract( TexCoord.y * fHeight ); // get the decimal part
for( int m = -1; m <=2; m++ )
{
for( int n =-1; n<= 2; n++)
{
vec4 vecData = texture2D(textureSampler,
TexCoord + vec2(texelSizeX * float( m ),
texelSizeY * float( n )));
I think we should use:
TexCoord - vec2(a,b)
then use offset of m and n
(C) Now I am confused. I think we will use intensity values of "exact" pixels in the input image.
Which way should we use?

Related

Converting YUV422 to RGB using GPU shader HLSL

I'm considering to perform the color space conversion from YUV422 to RGB using HLSL. A four-byte YUYV will yield 2 three-byte RGB values, for example, Y1UY2V will give R1G1B1(left pixel) and R2G2B2(right pixel). Given texture coordinates in pixel shader increased gradiently, how could I differentiate between the texture coordinates for the left pixels i.e. all R1G1B1 and the texture coordinates for right pixels i.e. all R2G2B2. This way I could render all R1G1B1 and all R2G2B2 on a single texture instead of two.
Thanks!
Not sure what version of DirectX you use, but here is the version I use for dx11 (please note in that case I send yuv data in a StructuredBuffer, which saves me the fact of dealing with row stride. You can apply the same technique sending your yuv data as texture of course (with few little changes to the code below).
Here is the pixel shader code (I assume your render target is same size as your input image, and that you render a full screen quad/triangle).
StructuredBuffer<uint> yuy;
int w;
int h;
struct psInput
{
float4 p : SV_Position;
float2 uv : TEXCOORD0;
};
float4 PS(psInput input) : SV_Target
{
//Calculate pixel location within buffer (if you use texture change lookup here)
uint2 xy = input.p.xy;
uint p = (xy.x) + (xy.y * w);
uint pixloc = p / 2;
uint pixdata = yuy[pixloc];
//Since pixdata is packed, use some bitshift to remove non useful data
uint v = (pixdata & 0xff000000) >> 24;
uint y1 = (pixdata & 0xff0000) >> 16;
uint u = (pixdata & 0xff00) >> 8;
uint y0 = pixdata & 0x000000FF;
//Check if you are left/right pixel
uint y = p % 2 == 0 ? y0: y1;
//Convert yuv to rgb
float cb = u;
float cr = v;
float r = (y + 1.402 * (cr - 128.0));
float g = (y - 0.344 * (cb - 128.0) - 0.714 * (cr - 128));
float b = (y + 1.772 * (cb - 128));
return float4(r,g,b,1.0f) / 256.0f;
}
Hope that helps.

How do you map Kinect's depth data to its RGB color?

I'm working with a given dataset using OpenCV, without any Kinect by my side. And I would like to map the given depth data to its RGB counterpart (so that I can get the actual color and the depth)
Since I'm using OpenCV and C++, and don't own a Kinect, sadly I can't utilize MapDepthFrameToColorFrame method from the official Kinect API.
From the given cameras' intrinsics and distortion coefficients, I could map the depth to world coordinates, and back to RGB based on the algorithm provided here
Vec3f depthToW( int x, int y, float depth ){
Vec3f result;
result[0] = (float) (x - depthCX) * depth / depthFX;
result[1] = (float) (y - depthCY) * depth / depthFY;
result[2] = (float) depth;
return result;
}
Vec2i wToRGB( const Vec3f & point ) {
Mat p3d( point );
p3d = extRotation * p3d + extTranslation;
float x = p3d.at<float>(0, 0);
float y = p3d.at<float>(1, 0);
float z = p3d.at<float>(2, 0);
Vec2i result;
result[0] = (int) round( (x * rgbFX / z) + rgbCX );
result[1] = (int) round( (y * rgbFY / z) + rgbCY );
return result;
}
void map( Mat& rgb, Mat& depth ) {
/* intrinsics are focal points and centers of camera */
undistort( rgb, rgb, rgbIntrinsic, rgbDistortion );
undistort( depth, depth, depthIntrinsic, depthDistortion );
Mat color = Mat( depth.size(), CV_8UC3, Scalar(0) );
ushort * raw_image_ptr;
for( int y = 0; y < depth.rows; y++ ) {
raw_image_ptr = depth.ptr<ushort>( y );
for( int x = 0; x < depth.cols; x++ ) {
if( raw_image_ptr[x] >= 2047 || raw_image_ptr[x] <= 0 )
continue;
float depth_value = depthMeters[ raw_image_ptr[x] ];
Vec3f depth_coord = depthToW( y, x, depth_value );
Vec2i rgb_coord = wToRGB( depth_coord );
color.at<Vec3b>(y, x) = rgb.at<Vec3b>(rgb_coord[0], rgb_coord[1]);
}
}
But the result seems to be misaligned. I can't manually set the translations, since the dataset is obtained from 3 different Kinects, and each of them are misaligned in different direction. You could see one of it below (Left: undistorted RGB, Middle: undistorted Depth, Right: mapped RGB to Depth)
My question is, what should I do at this point? Did I miss a step while trying to project either depth to world or world back to RGB? Can anyone who has experienced with stereo camera point out my missteps?
I assume you would need to calibrate the depth sensor with the RGB data in the same way you would calibrate a stereo cameras. OpenCV has some functions (and tutorials) that you may be able to leverage.
A few other things that may be useful
http://www.ros.org/wiki/kinect_calibration/technical
https://github.com/robbeofficial/KinectCalib
http://www.mathworks.com/matlabcentral/linkexchange/links/2882-kinect-calibration-toolbox
This contains a paper on how to do it.
OpenCV has no functions for aligning depth stream to color video stream. But I know that there is special function named MapDepthFrameToColorFrame in "Kinect for Windows SDK".
I have no code for example, but hope that this is good point to start.
Upd:
Here is same example of mapping color image to depth using KinectSDK with interface to OpenCV (not my code).
It looks like you are not considering in your solution the extrinsics between both cameras.
Yes, you didn't consider the transformation between RGB and Depth.
But you can compute this matrix by using cvStereoCalibrate() method which just pass the image sequences of both RGB and Depth with checkerboard corners to it.
The detail you can find in OpecvCV documentation:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#double stereoCalibrate(InputArrayOfArrays objectPoints, InputArrayOfArrays imagePoints1, InputArrayOfArrays imagePoints2, InputOutputArray cameraMatrix1, InputOutputArray distCoeffs1, InputOutputArray cameraMatrix2, InputOutputArray distCoeffs2, Size imageSize, OutputArray R, OutputArray T, OutputArray E, OutputArray F, TermCriteria criteria, int flags)
And the whole method idea behind this is:
color uv <- color normalize <- color space <- DtoC transformation <- depth space <- depth normalize <- depth uv
(uc,vc) <- <- ExtrCol * (pc) <- stereo calibrate MAT <- ExtrDep^-1 * (pd) <- <(ud - cx)*d / fx, (vd-cy)*d/fy, d> <- (ud, vd)
If you want to add distortion to RGB, you just need to follow the step in:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html

Drawing marching ants using directx

I have to draw a selection feedback like Photoshop in my directx application. I came across an algorithm on wikipedia to do this. But, I am not sure if its the right way to do this especially if my selection area could be any arbitrary geometry. Has someone implemented it using Directx? Any hints are much appreciated.
Based on my comment here is a simple pixel shader to achieve the wanted result:
float4 PS( float4 pos : SV_POSITION) : SV_Target
{
float w = ((int)(pos.x + pos.y + t) % 8);
return (w < 4 ? float4(0,0,0,1) : float4(1,1,1,1));
}
x and y are added to produce the diagonal stripe pattern. You can imagine it as follows: If y is constant and x increases by 1, w also increases by 1. The same applies for y. So for w to stay constant, you have to go (x+1, y-1) or (x-1, y+1) (or other step sizes). We use the % operator to produce a periodicity of 8 pixels. The first half period is filled black and the second half white.
This is an equivalent, but more performant shader. It uses bit operations instead of modulo and comparisons.
float4 PS( float4 pos : SV_POSITION) : SV_Target
{
int w = ((int)(pos.x + pos.y + t) & 4);
return float4(w,w,w,1);
}

HLSL Modify Tessellation shader to make equilateral triangles?

Details:
I'm in the proccess of procedural planet generation; so far I have done the dynamic LOD work, but my current software algorithm is very very slow. I decided to do it using DX11's new tessellation features instead.
Currently my sphere is a subdivided icosahedron. (20 sides all equilateral triangles)
Back when I was subdividing using my software algorithm, one triangle would be
split into four children across the midpoints of the parent forming the Hyrule symbol each time...like this: http://puu.sh/1xFIx
As you can see, each triangle subdivided created more and more equilateral triangles, i.e. each one was exactly the same shape.
But now that I am using the GPU to tessellate in HLSL, the result is definately not
what I am looking for: http://puu.sh/1xFx7
Questions:
Is there anything I can do in the Hull and Domain shaders to change the tessellation
so that it subdivides into sets of equilateral triangles like the first image?
Should I be using the geometry shader for something like this? If so, would it be
slower then the tessellator?
I tried using Tessellation Shader, but I encontred a problem: the domain shader only pass the uv coordinate (SV_DomainLocation) and the input patch for positionining the vertices, when the domain location for vertex is 0.3, 0.3, 0.3 (center vertex) is impossible to know the correct position because you need information about the other vertices or a index(x, y) of iteration that's not provided by the Domain Shader Stage.
because this problem I write the code in geometry shader, this shader is very limited for tessellations because the output stream cannot have a size bigger than 1024 bytes (in shader model 5.0). I implemented the calculation of vertex positions using the uv (like SV_DomainLocation) but this only tessellate the triangles, you must use part of your code to calculate added position in center of triangles to create the precise final result.
this is the code for equilateral triangles tessellation:
// required for array
#define MAX_ITERATIONS 5
void DrawTriangle(float4 p0, float4 p1, float4 p2, inout TriangleStream<VS_OUT> stream)
{
VS_OUT v0;
v0.pos = p0;
stream.Append(v0);
VS_OUT v1;
v1.pos = p1;
stream.Append(v1);
VS_OUT v2;
v2.pos = p2;
stream.Append(v2);
stream.RestartStrip();
}
[maxvertexcount(128)] // directx rule: maxvertexcount * sizeof(VS_OUT) <= 1024
void gs(triangle VS_OUT input[3], inout TriangleStream<VS_OUT> stream)
{
int itc = min(tess, MAX_ITERATIONS);
float fitc = itc;
float4 past_pos[MAX_ITERATIONS];
float4 array_pass[MAX_ITERATIONS];
for (int pi = 0; pi < MAX_ITERATIONS; pi++)
{
past_pos[pi] = float4(0, 0, 0, 0);
array_pass[pi] = float4(0, 0, 0, 0);
}
// -------------------------------------
// Tessellation kernel for the control points
for (int x = 0; x <= itc; x++)
{
float4 last;
for (int y = 0; y <= x; y++)
{
float2 seg = float2(x / fitc, y / fitc);
float3 uv;
uv.x = 1 - seg.x;
uv.z = seg.y;
uv.y = 1 - (uv.x + uv.z);
// ---------------------------------------
// Domain Stage
// uv Domain Location
// x,y IterationIndex
float4 fpos = input[0].pos * uv.x;
fpos += input[1].pos * uv.y;
fpos += input[2].pos * uv.z;
if (x > 0 && y > 0)
{
DrawTriangle(past_pos[y - 1], last, fpos, stream);
if (y < x)
{
// add adjacent triangle
DrawTriangle(past_pos[y - 1], fpos, past_pos[y], stream);
}
}
array_pass[y] = fpos;
last = fpos;
}
for (int i = 0; i < MAX_ITERATIONS; i++)
{
past_pos[i] = array_pass[i];
}
}
}

Dot Product and Luminance/ Findmyicone

All,
I have a basic question that I am struggling with here. When you look at the findmyicone sample code from WWDC 2010, you will see this:
static const uint8_t orangeColor[] = {255, 127, 0};
uint8_t referenceColor[3];
// Remove luminance
static inline void normalize( const uint8_t colorIn[], uint8_t colorOut[] ) {
// Dot product
int sum = 0;
for (int i = 0; i < 3; i++)
sum += colorIn[i] / 3;
for (int j = 0; j < 3; j++)
colorOut[j] = (float) ((colorIn[j] / (float) sum) * 255);
}
And then it is called:
normalize(orangeColor, referenceColor);
Running the debugger, it is converting BGRA: (Red 255, Green 127, Blue 0) to (Red 0, Green 255, Blue 0). I have looked on the web and SO to find details on luminance and dot product and there is really no information.
1- Can someone guide me on what this function is doing?
2- Can you guide me to some helpful topics/primer online as well?
Thanks again
KMB
What they're trying to do is track a particular color across variations in brightness, so they're normalizing for the luminance of the color. I do something similar in the fragment shader I use in a color tracking example based on a GPU Gems paper from Apple, as well as the ColorObjectTracking sample application in my GPUImage framework:
vec3 normalizeColor(vec3 color)
{
return color / max(dot(color, vec3(1.0/3.0)), 0.3);
}
vec4 maskPixel(vec3 pixelColor, vec3 maskColor)
{
float d;
vec4 calculatedColor;
// Compute distance between current pixel color and reference color
d = distance(normalizeColor(pixelColor), normalizeColor(maskColor));
// If color difference is larger than threshold, return black.
calculatedColor = (d > threshold) ? vec4(0.0) : vec4(1.0);
//Multiply color by texture
return calculatedColor;
}
The above calculation takes the average of the three color components by multiplying each channel by 1/3 and then summing them (that's what the dot product does here). It then divides each color channel by this average to arrive at a normalized color.
The distance between this normalized color and the target one is calculated, and if it is within a certain threshold the pixel is marked as being of that color.
This is just one way of determining proximity of one color to another. Another way is to convert the RGB values into Y, Cr, and Cb (Y, U, and V) components and then take the distance between just the chrominance portions (Cr and Cb):
vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
vec4 textureColor2 = texture2D(inputImageTexture2, textureCoordinate2);
float maskY = 0.2989 * colorToReplace.r + 0.5866 * colorToReplace.g + 0.1145 * colorToReplace.b;
float maskCr = 0.7132 * (colorToReplace.r - maskY);
float maskCb = 0.5647 * (colorToReplace.b - maskY);
float Y = 0.2989 * textureColor.r + 0.5866 * textureColor.g + 0.1145 * textureColor.b;
float Cr = 0.7132 * (textureColor.r - Y);
float Cb = 0.5647 * (textureColor.b - Y);
float blendValue = 1.0 - smoothstep(thresholdSensitivity, thresholdSensitivity + smoothing, distance(vec2(Cr, Cb), vec2(maskCr, maskCb)));
This code is what I use in a chroma keying shader, and it's based on a similar calculation that Apple uses in one of their sample applications. Which one is best can depend on the particular situation you're facing.

Resources