Dot Product and Luminance/ Findmyicone - ios

All,
I have a basic question that I am struggling with here. When you look at the findmyicone sample code from WWDC 2010, you will see this:
static const uint8_t orangeColor[] = {255, 127, 0};
uint8_t referenceColor[3];
// Remove luminance
static inline void normalize( const uint8_t colorIn[], uint8_t colorOut[] ) {
// Dot product
int sum = 0;
for (int i = 0; i < 3; i++)
sum += colorIn[i] / 3;
for (int j = 0; j < 3; j++)
colorOut[j] = (float) ((colorIn[j] / (float) sum) * 255);
}
And then it is called:
normalize(orangeColor, referenceColor);
Running the debugger, it is converting BGRA: (Red 255, Green 127, Blue 0) to (Red 0, Green 255, Blue 0). I have looked on the web and SO to find details on luminance and dot product and there is really no information.
1- Can someone guide me on what this function is doing?
2- Can you guide me to some helpful topics/primer online as well?
Thanks again
KMB

What they're trying to do is track a particular color across variations in brightness, so they're normalizing for the luminance of the color. I do something similar in the fragment shader I use in a color tracking example based on a GPU Gems paper from Apple, as well as the ColorObjectTracking sample application in my GPUImage framework:
vec3 normalizeColor(vec3 color)
{
return color / max(dot(color, vec3(1.0/3.0)), 0.3);
}
vec4 maskPixel(vec3 pixelColor, vec3 maskColor)
{
float d;
vec4 calculatedColor;
// Compute distance between current pixel color and reference color
d = distance(normalizeColor(pixelColor), normalizeColor(maskColor));
// If color difference is larger than threshold, return black.
calculatedColor = (d > threshold) ? vec4(0.0) : vec4(1.0);
//Multiply color by texture
return calculatedColor;
}
The above calculation takes the average of the three color components by multiplying each channel by 1/3 and then summing them (that's what the dot product does here). It then divides each color channel by this average to arrive at a normalized color.
The distance between this normalized color and the target one is calculated, and if it is within a certain threshold the pixel is marked as being of that color.
This is just one way of determining proximity of one color to another. Another way is to convert the RGB values into Y, Cr, and Cb (Y, U, and V) components and then take the distance between just the chrominance portions (Cr and Cb):
vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
vec4 textureColor2 = texture2D(inputImageTexture2, textureCoordinate2);
float maskY = 0.2989 * colorToReplace.r + 0.5866 * colorToReplace.g + 0.1145 * colorToReplace.b;
float maskCr = 0.7132 * (colorToReplace.r - maskY);
float maskCb = 0.5647 * (colorToReplace.b - maskY);
float Y = 0.2989 * textureColor.r + 0.5866 * textureColor.g + 0.1145 * textureColor.b;
float Cr = 0.7132 * (textureColor.r - Y);
float Cb = 0.5647 * (textureColor.b - Y);
float blendValue = 1.0 - smoothstep(thresholdSensitivity, thresholdSensitivity + smoothing, distance(vec2(Cr, Cb), vec2(maskCr, maskCb)));
This code is what I use in a chroma keying shader, and it's based on a similar calculation that Apple uses in one of their sample applications. Which one is best can depend on the particular situation you're facing.

Related

GPUImage Lookup Filter - creating a color depth greater than 512² colors

GPUImage's LookupFilter uses an RGB pixel map that's 512x512. When the filter executes, it creates a comparison between a modified version of this image with the original, and extrapolates an image filter.
The filter code is pretty straightforward. Here's an extract so you can see what's going on:
void main()
{
highp vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
highp float blueColor = textureColor.b * 63.0;
highp vec2 quad1;
quad1.y = floor(floor(blueColor) / 8.0);
quad1.x = floor(blueColor) - (quad1.y * 8.0);
highp vec2 quad2;
quad2.y = floor(ceil(blueColor) / 8.0);
quad2.x = ceil(blueColor) - (quad2.y * 8.0);
highp vec2 texPos1;
texPos1.x = (quad1.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.r);
texPos1.y = (quad1.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.g);
highp vec2 texPos2;
texPos2.x = (quad2.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.r);
texPos2.y = (quad2.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * textureColor.g);
lowp vec4 newColor1 = texture2D(inputImageTexture2, texPos1);
lowp vec4 newColor2 = texture2D(inputImageTexture2, texPos2);
lowp vec4 newColor = mix(newColor1, newColor2, fract(blueColor));
gl_FragColor = mix(textureColor, vec4(newColor.rgb, textureColor.w), intensity);
}
);
See where the filter map is dependent on this being a 512x512 image?
I'm looking at ways to 4x the color depth here, using a 1024x1024 source image instead, but I'm not sure how this lookup filter image would have originally been generated.
Can something like this be generated in code? If so, I realize it's a very broad question, but how would I go about doing that? If it can't be generated in code, what are my options?
—-
Update:
Turns out the original LUT generation code was included in the header file all along. The questionable part here is from the header file:
Lookup texture is organised as 8x8 quads of 64x64 pixels representing all possible RGB colors:
How is 64x64 a map of all possible RGB channels? 64³ = 262,144 but that only accounts for 1/64th of the presumed 24-bit capacity of RGB, which is 64³ (16,777,216). What's going on here? Am I missing the way this LUT works? How are we accounting for all possible RGB colors with only 1/64th of the data?
for (int by = 0; by < 8; by++) {
for (int bx = 0; bx < 8; bx++) {
for (int g = 0; g < 64; g++) {
for (int r = 0; r < 64; r++) {
image.setPixel(r + bx * 64, g + by * 64, qRgb((int)(r * 255.0 / 63.0 + 0.5),
(int)(g * 255.0 / 63.0 + 0.5),
(int)((bx + by * 8.0) * 255.0 / 63.0 + 0.5)));
}
}
}
}
I'm not quite sure what problem you are actually having. When you say you want "4x the color depth" what do you actually mean. Color depth normally means the number of bits per color channel (or per pixel), which is totally independent of the resolution of the image.
In terms of lookup table accuracy (which is resolution dependent), assuming you are using bilinear filtered texture inputs from the original texture, and filtered lookups into the transform table, then you are already linearly interpolating between samples in the lookup table. Interpolation of color channels will be at higher precision than the storage format; e.g. often fp16 equivalent, even for textures stored at 8-bit per pixel.
Unless you have a significant amount of non-linearity in your color transform (not that common) adding more samples to the lookup table is unlikely to make a significant difference to the output - the interpolation will already be doing a reasonably good job of filling in the gaps.
Lev Zelensky provided the original work for this, so I'm not as familiar with how this works internally, but you can look at the math being performed in the shader to get an idea of what's going on.
In the 512x512 lookup, you have an 8x8 grid of cells. Within those cells, you have a 64x64 image patch. The red values go from 0 to 255 (0.0 to 1.0 in normalized values) going from left to right in that patch, and the green values go from 0 to 255 going down. That means that there are 64 steps in red, and 64 steps in green.
Each cell then appears to increase the blue value as you progress down the patches, left to right, top to bottom. With 64 patches, that gives you 64 blue values to match the 64 red and green ones. That gives you equal coverage across the RGB values in all channels.
So, if you wanted to double the number of color steps, you'd have to double the patch size to 128x128 and have 128 grids. It'd have to be more of a rectangle due to 128 not having an integer square root. Just going to 1024x1024 might let you double the color depth in the red and green channels, but blue would now be half their depth. Balancing the three out would be a little trickier than just doubling the image size.

How to reproduce Photoshop's multiply blending in OpenCV?

I'm trying to reproduce Photoshop's multiply blend mode in OpenCV. Equivalents to this would be what you find in GIMP, or when you use the CIMultiplyBlendMode in Apple's CoreImage framework.
Everything I read online suggests that multiply blending is accomplished simply by multiplying the channels of the two input images (i.e., Blend = AxB). And, this works, except for the case(s) where alpha is < 1.0.
You can test this very simply in GIMP/PhotoShop/CoreImage by creating two layers/images, filling each with a different solid color, and then modifying the opacity of the first layer. (BTW, when you modify alpha, the operation is no longer commutative in GIMP for some reason.)
A simple example: if A = (0,0,0,0) and B = (0.4,0,0,1.0), and C = AxB, then I would expect C to be (0,0,0,0). This is simple multiplication. But this is not how this blend is implemented in practice. In practice, C = (0.4,0,0,1.0), or C = B.
The bottom line is this: I need to figure out the formula for the multiply blend mode (which is clearly more than AxB) and then implement it in OpenCV (which should be trivial once I have the formula).
Would appreciate any insights.
Also, for reference, here are some links which show multiply blend as being simply AxB:
How does photoshop blend two images together
Wikipedia - Blend Modes
Photoshop Blend Modes
Here is an OpenCV solution based the source code of GIMP, specifically the function gimp_operation_multiply_mode_process_pixels.
NOTE
Instead of looping on all pixels it can be vectorized, but I followed the steps of GIMP.
Input images must be of type CV_8UC3 or CV_8UC4.
it supports also the opacity value, that must be in [0, 255]
in the original GIMP implementation there is also the support for a mask. It can be trivially added to the code, eventually.
This implementation is in fact not symmetrical, and reproduce your strange behaviour.
Code:
#include <opencv2\opencv.hpp>
using namespace cv;
Mat blend_multiply(const Mat& level1, const Mat& level2, uchar opacity)
{
CV_Assert(level1.size() == level2.size());
CV_Assert(level1.type() == level2.type());
CV_Assert(level1.channels() == level2.channels());
// Get 4 channel float images
Mat4f src1, src2;
if (level1.channels() == 3)
{
Mat4b tmp1, tmp2;
cvtColor(level1, tmp1, COLOR_BGR2BGRA);
cvtColor(level2, tmp2, COLOR_BGR2BGRA);
tmp1.convertTo(src1, CV_32F, 1. / 255.);
tmp2.convertTo(src2, CV_32F, 1. / 255.);
}
else
{
level1.convertTo(src1, CV_32F, 1. / 255.);
level2.convertTo(src2, CV_32F, 1. / 255.);
}
Mat4f dst(src1.rows, src1.cols, Vec4f(0., 0., 0., 0.));
// Loop on every pixel
float fopacity = opacity / 255.f;
float comp_alpha, new_alpha;
for (int r = 0; r < src1.rows; ++r)
{
for (int c = 0; c < src2.cols; ++c)
{
const Vec4f& v1 = src1(r, c);
const Vec4f& v2 = src2(r, c);
Vec4f& out = dst(r, c);
comp_alpha = min(v1[3], v2[3]) * fopacity;
new_alpha = v1[3] + (1.f - v1[3]) * comp_alpha;
if ((comp_alpha > 0.) && (new_alpha > 0.))
{
float ratio = comp_alpha / new_alpha;
out[0] = max(0.f, min(v1[0] * v2[0], 1.f)) * ratio + (v1[0] * (1.f - ratio));
out[1] = max(0.f, min(v1[1] * v2[1], 1.f)) * ratio + (v1[1] * (1.f - ratio));
out[2] = max(0.f, min(v1[2] * v2[2], 1.f)) * ratio + (v1[2] * (1.f - ratio));
}
else
{
out[0] = v1[0];
out[1] = v1[1];
out[2] = v1[2];
}
out[3] = v1[3];
}
}
Mat3b dst3b;
Mat4b dst4b;
dst.convertTo(dst4b, CV_8U, 255.);
cvtColor(dst4b, dst3b, COLOR_BGRA2BGR);
return dst3b;
}
int main()
{
Mat3b layer1 = imread("path_to_image_1");
Mat3b layer2 = imread("path_to_image_2");
Mat blend = blend_multiply(layer1, layer2, 255);
return 0;
}
I managed to sort this out. Feel free to comment with any suggested improvements.
First, I found a clue as to how to implement the multiply function in this post:
multiply blending
And here's a quick OpenCV implementation in C++.
Mat MultiplyBlend(const Mat& cvSource, const Mat& cvBackground) {
// assumption: cvSource and cvBackground are of type CV_8UC4
// formula: (cvSource.rgb * cvBackground.rgb * cvSource.a) + (cvBackground.rgb * (1-cvSource.a))
Mat cvAlpha(cvSource.size(), CV_8UC3, Scalar::all(0));
Mat input[] = { cvSource };
int from_to[] = { 3,0, 3,1, 3,2 };
mixChannels(input, 1, &cvAlpha, 1, from_to, 3);
Mat cvBackgroundCopy;
Mat cvSourceCopy;
cvtColor(cvSource, cvSourceCopy, CV_RGBA2RGB);
cvtColor(cvBackground, cvBackgroundCopy, CV_RGBA2RGB);
// A = cvSource.rgb * cvBackground.rgb * cvSource.a
Mat cvBlendResultLeft;
multiply(cvSourceCopy, cvBackgroundCopy, cvBlendResultLeft, 1.0 / 255.0);
multiply(cvBlendResultLeft, cvAlpha, cvBlendResultLeft, 1.0 / 255.0);
delete(cvSourceCopy);
// invert alpha
bitwise_not(cvAlpha, cvAlpha);
// B = cvBackground.rgb * (1-cvSource.a)
Mat cvBlendResultRight;
multiply(cvBackgroundCopy, cvAlpha, cvBlendResultRight, 1.0 / 255.0);
delete(cvBackgroundCopy, cvAlpha);
// A + B
Mat cvBlendResult;
add(cvBlendResultLeft, cvBlendResultRight, cvBlendResult);
delete(cvBlendResultLeft, cvBlendResultRight);
cvtColor(cvBlendResult, cvBlendResult, CV_RGB2RGBA);
return cvBlendResult;
}

Image processing: interpolation using intensity values of pixels in the input image

When we do image interpolation, I think we will use intensity values of pixels in the input image.
(A)
I am reading the code of cubic interpolation from GPU Gems Chapter 24. High-Quality Filtering. Here is a snippet of their code:
Example 24-9. Filtering Four Texel Rows, Then Filtering the Results as a Column
float4 texRECT_bicubic(uniform
samplerRECT tex,
uniform
samplerRECT kernelTex,
float2 t)
{
float2 f = frac(t); // we want the sub-texel portion
float4 t0 = cubicFilter(kernelTex, f.x,
texRECT(tex, t + float2(-1, -1)),
texRECT(tex, t + float2(0, -1)),
texRECT(tex, t + float2(1, -1)),
texRECT(tex, t + float2(2, -1)));
Since they get the sub-texel protion from frac(t), "t" is not exactly on pixel positions of the input image.
Then how come "t" is directly used to sample intensity values from the original images, like in "texRECT(tex, t + float2(-1, -1))"?
Personally, I think we should use
t - frac(t)
(B)
Same in an example from "Zoom An Image With Different Interpolation Types"
Their snippet of "GLSL shader code for Bi-Cubic Interpolation" is:
float a = fract( TexCoord.x * fWidth ); // get the decimal part
float b = fract( TexCoord.y * fHeight ); // get the decimal part
for( int m = -1; m <=2; m++ )
{
for( int n =-1; n<= 2; n++)
{
vec4 vecData = texture2D(textureSampler,
TexCoord + vec2(texelSizeX * float( m ),
texelSizeY * float( n )));
I think we should use:
TexCoord - vec2(a,b)
then use offset of m and n
(C) Now I am confused. I think we will use intensity values of "exact" pixels in the input image.
Which way should we use?

Converting YUV422 to RGB using GPU shader HLSL

I'm considering to perform the color space conversion from YUV422 to RGB using HLSL. A four-byte YUYV will yield 2 three-byte RGB values, for example, Y1UY2V will give R1G1B1(left pixel) and R2G2B2(right pixel). Given texture coordinates in pixel shader increased gradiently, how could I differentiate between the texture coordinates for the left pixels i.e. all R1G1B1 and the texture coordinates for right pixels i.e. all R2G2B2. This way I could render all R1G1B1 and all R2G2B2 on a single texture instead of two.
Thanks!
Not sure what version of DirectX you use, but here is the version I use for dx11 (please note in that case I send yuv data in a StructuredBuffer, which saves me the fact of dealing with row stride. You can apply the same technique sending your yuv data as texture of course (with few little changes to the code below).
Here is the pixel shader code (I assume your render target is same size as your input image, and that you render a full screen quad/triangle).
StructuredBuffer<uint> yuy;
int w;
int h;
struct psInput
{
float4 p : SV_Position;
float2 uv : TEXCOORD0;
};
float4 PS(psInput input) : SV_Target
{
//Calculate pixel location within buffer (if you use texture change lookup here)
uint2 xy = input.p.xy;
uint p = (xy.x) + (xy.y * w);
uint pixloc = p / 2;
uint pixdata = yuy[pixloc];
//Since pixdata is packed, use some bitshift to remove non useful data
uint v = (pixdata & 0xff000000) >> 24;
uint y1 = (pixdata & 0xff0000) >> 16;
uint u = (pixdata & 0xff00) >> 8;
uint y0 = pixdata & 0x000000FF;
//Check if you are left/right pixel
uint y = p % 2 == 0 ? y0: y1;
//Convert yuv to rgb
float cb = u;
float cr = v;
float r = (y + 1.402 * (cr - 128.0));
float g = (y - 0.344 * (cb - 128.0) - 0.714 * (cr - 128));
float b = (y + 1.772 * (cb - 128));
return float4(r,g,b,1.0f) / 256.0f;
}
Hope that helps.

GPUImage - How to specify filter size for GPUImageMedianFilter and GPUImageGaussianBlurFilter

Hi GPUImage community and Brad,
I would like to specify the filter size (radius) of the GPUImageMedianFilter
and GPUImageGaussianBlurFilter.
Does that demand specifying GPU commends? Or can it be done through the GPUImage wrapper?
If so, how can I do that?
Thanks
This is probably not the place to ask a specific question about this framework, but I can answer you on this.
The GPUImageMedianFilter is a hardcoded 3x3 median filter based on the article "A Fast, Small-Radius GPU Median Filter" by Morgan McGuire in the ShaderX6 book. More on this can be found here, including larger-radius versions of this. Despite being the fastest implementation of this that I have found, it is still incredibly slow to run on all but the fastest iOS devices, so increasing the sampling area will only slow this down further.
The GPUImageGaussianBlurFilter does a 9-hit simple Gaussian blur in two separated passes. The blurSize property allows you to expand or contract the sampling area slightly, but if you go beyond a multiplier of 1.5, you'll start seeing fringe artifacts due to too few samples being used to blur over a large area. I'm working on a couple of ways of expanding the blur area in a performant manner, but that is the limitation of this particular filter.
Here's how to calculate the median within the pixel-neighborhood radius of your choosing:
kernel vec4 medianUnsharpKernel(sampler u) {
vec4 pixel = unpremultiply(sample(u, samplerCoord(u)));
vec2 xy = destCoord();
int radius = 3;
int bounds = (radius - 1) / 2;
vec4 sum = vec4(0.0);
for (int i = (0 - bounds); i <= bounds; i++)
{
for (int j = (0 - bounds); j <= bounds; j++ )
{
sum += unpremultiply(sample(u, samplerTransform(u, vec2(xy + vec2(i, j)))));
}
}
vec4 mean = vec4(sum / vec4(pow(float(radius), 2.0)));
float mean_avg = float(mean);
float comp_avg = 0.0;
vec4 comp = vec4(0.0);
vec4 median = mean;
for (int i = (0 - bounds); i <= bounds; i++)
{
for (int j = (0 - bounds); j <= bounds; j++ )
{
comp = unpremultiply(sample(u, samplerTransform(u, vec2(xy + vec2(i, j)))));
comp_avg = float(comp);
median = (comp_avg < mean_avg) ? max(median, comp) : median;
}
}
return premultiply(vec4(vec3(abs(pixel.rgb - median.rgb)), 1.0));
}
A brief description of the steps
1. Calculate the mean of the values of the pixels surrounding the source pixel in a 3x3 neighborhood;
2. Find the maximum pixel value of all pixels in the same neighborhood that are less than the mean.
3. [OPTIONAL] Subtract the median pixel value from the source pixel value for edge detection.
If you're using the median value for edge detection, there are a couple of ways to modify the above code for better results, namely, hybrid median filtering and truncated media filtering (a substitute and a better 'mode' filtering). If you're interested, please ask.

Resources