GLSL - Change specific Color of Texture to another Color - ios

My textures consist of 4 different colors. I want to change each color to a different color. I tried it the following way:
precision mediump float;
varying lowp vec4 vColor;
varying highp vec2 vUv;
uniform sampler2D texture;
bool inRange( float c1, float c2 ) {
return abs( c1 - c2 ) < 0.01;
}
void main() {
vec4 c = texture2D(texture, vUv);
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
else if ( inRange( c.r, 15.0/255.0 ) && inRange( c.g, 59.0/255.0 ) && inRange( c.b, 5.0/255.0 ) )
c = vec4( 65.0/255.0, 65.0/255.0, 65.0/255.0, 1.0 );
else if ( inRange( c.r, 157.0/255.0 ) && inRange( c.g, 184.0/255.0 ) && inRange( c.b, 55.0/255.0 ) )
c = vec4( 254.0/255.0, 247.0/255.0, 192.0/255.0, 1.0 );
else if ( inRange( c.r, 107.0/255.0 ) && inRange( c.g, 140.0/255.0 ) && inRange( c.b, 38.0/255.0 ) )
c = vec4( 226.0/255.0, 148.0/255.0, 148.0/255.0, 1.0 );
gl_FragColor = c;
}
This works. But it's terribly slow. I'm running this on an iPhone, but the calculations aren't that hard or am I missing something?
Is there a faster way to do this?

Branches are bad for shader performance. Normally, the GPU executes multiple fragment shaders (each for their own fragment) at once. They all run in lockstep -- SIMD processing means that in effect all parallel fragment processors are running the same code but operating on different data. When you have conditionals, it's possible for different fragments to be on different code paths, so you lose SIMD parallelism.
One of the best performance tricks for this sort of application is using a Color Lookup Table. You provide a 3D texture (the lookup table) and use the GLSL texture3D function to look up into it -- the input coordinates are the R, G, and B values of your original color, and the output is the replacement color.
This is very fast, even on mobile hardware -- the fragment shader doesn't have to do any computation, and the texture lookup is usually cached before the fragment shader even runs.
Constructing a lookup table texture is easy. Conceptually, it's cube that encodes every possible RGB value (x axis is R from 0.0 to 1.0, y axis is G, z axis is B). If you organize it as a 2D image, you can then open it in your favorite image editor and apply any color transformation filters you like to it. The filtered image is your conversion lookup table. There's a decent writeup on the technique here and another in GPU Gems 2. A more general discussion of the technique, applied using Core Image filters, is in Apple's documentation library.

EDIT: It was confirmed by the asker that it is the presence of any branches that causes the incredible slowdown. I will provide an attempt at a branchless solution.
Well, if branches (including using the ternary "?" operator) are unusable, you can only use arithmetic.
A possible solution (which is hideous from a maintenance perspective, but might fit your need) is to map your input color to output color using polynomials that give desired output for the 4 colors you care about. I treated the 3 RGB color channels separately and plugged in the input/output points into wolfram alpha with a cubic fit (example for the red channel here: http://www.wolframalpha.com/input/?i=cubic+fit+%7B238.0%2C+254.0%7D%2C%7B15.0%2C+65.0%7D%2C+%7B157.0%2C+254.0%7D%2C+%7B107.0%2C+226.0%7D). You could use any polynomial fit program for this purpose.
The code for the red channel is then:
float redResult = 20.6606 + 3.15457 * c.r - 0.0135167 * c.r*c.r + 0.0000184102 c.r*c.r*c.r
Rinse and repeat the process with the green and blue color channels and you have your shader. Note that you may want to specify the very small coefficients in scientific notation to retain accuracy... I don't know how your particular driver handles floating-point literals.
Even then you may (probably) have precision issues, but its worth a shot.
Another possibility is using an approximate Bump Function (I say approximate, since you don't actually care about the smoothness constraints). You just want a value thats 1 at the color you care about and 0 everywhere else far enough away. Say you have a three-component bump function: bump3 that takes a vec3 for the location of the bump and a vec3 for the location to evaluate the function at. Then you can rewrite one of your first conditional from:
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
to:
vec3 colorIn0 = vec3(238.0/255.0, 255.0/255.0, 84.0/255.0);
vec3 colorOut0 = vec3(254.0/255.0, 254.0/255.0, 247.0/255.0)
result.rgb = c.rgb + bump3(colorIn0, c.rgb)) * (colorOut0-colorIn0);
If max/min are fast on your hardware (they might be full branches under the hood :( ), a possible quick and dirty bump3() implementation might be:
float bump3(vec3 b, vec3 p) {
vec3 diff = abs(b-p);
return max(0.0, 1.0 - 255.0*(diff.x + diff.y + diff.z));
}
Other possibilities for bump3 might be abusing smoothstep (again, if is fast on your hardware) or using the exponential.
The polynomial approach has the added (incidental) benefit of generalizing your map to more than just the four colors, but requires many arithmetic operations, is a maintenance nightmare, and likely suffers from precision issues. The bump function approach, on the other hand, should produce the same results as your current shader, even on input that is not one of those four colors, and is much more readable and maintainable (adding another color pair is trivial, compared to the polynomial approach). However, in the implementation I gave, it uses a max, which might be a branch under the hood (I hope not, geez).
Original answer below
It would be good to know how you are getting timing information so we can be sure its this shader thats slow (you could test this by just making this a pass-through shader as a quick hack... I recommend getting used to using a profiler though). It seem exceedingly odd that such a straightforward shader is slow.
Otherwise, if your texture truly only has those 4 colors (and it is guaranteed), then you can trivially take the number of inRange calls down from 12 to 3 by removing the if from the last branch (just make it an else), and then only testing the r value of c. I don't know how the iPhone's glsl optimizer works, but then you could further try to replace the if statements with ternary operators and see if that makes a difference. Those are the only changes I can think of and unfortunately you can't do the definite optimization if your textures aren't guaranteed to only have those 4 colors.
I would again like to point out that you should make sure this shader is causing the slowdown before trying to optimize.

Related

unity mvp Matrix on ios

I am working on water simulation, I need to sample _CameraDepthTexture to get Opaque depth, it works well on Windows. But the shader get different depth on IOS.
vert:
o.pos = mul (UNITY_MATRIX_MVP, v.vertex);
o.ref = ComputeScreenPos(o.pos);
COMPUTE_EYEDEPTH(o.ref.z);
frag:
uniform sampler2D_float _CameraDepthTexture;
float raw_depth = UNITY_SAMPLE_DEPTH(tex2Dproj(_CameraDepthTexture, UNITY_PROJ_COORD(uv2)));
On windows, the raw_depth is around 0.98, but On IOS, The raw_depth is around 0.51.
I guess this result interrelate with MVP in difference Platform.

tex2Dproj equivalent in Metal iOS

How do you do hardware accelerated texture projection in Metal? I cannot find any reference or resource that describes how to do it.
You just do the divide yourself.
OpenGL:
a = tex2Dproj( texture, texcoord.xyzw )
b = tex2Dproj( texture, texcoord.xyz )
Metal equivilent:
a = texture.sample( sampler, texcoord.xy/texcoord.w )
b = texture.sample( sampler, texcoord.xy/texcoord.z )
(Choose 'a' or 'b' depending on the type of projection you are doing, more commonly it is 'a')

Updating float4 declaration from dx9 to dx11

There's a shader which was given which I'm trying to update to be compatible with the newest Unity 5 (presumably dx11). I don't understand how float4 basic instantiation from dx9 was working. Can someone help me understand the following syntax and then provide an equivalent dx11 syntax?
I understand that float4 normally uses x,y,z,w or xyz,y as arguments, but what does did a single float argument do? Did float4(0.01) make {.01,0,0,0} or does float4(.01) make {.01,.01,.01,.01}?
Original code from the shader:
float4 Multiply19 = float4( 0.01 ) * float4( 0 );
It should make a new float4 with all members (xyzw) set to 0.01 and then multiply all that by 0, effectively making Multiply19 a (0, 0, 0, 0) float4.

opengl: can I write to multiple locations in the output buffer when using a shader?

I have a few equations that I have running in a CPU-based program to process images for iOS. The output is in the form:
for (y = 0; y < rows; ++y){
for (x = 0; x < cols; ++x){
<do math>
outputImage[y*cols + x] += <some result>
outputImage[y*cols + (x+1)] += <some result>
outputImage[(y+1)*cols + x] += <some result>
}
}
I think that this code can (and should) be thrown onto the GPU, probably via GPUImage. The trick is the outputs-- from my understanding, I can only put the results of a shader into gl_FragColor. Is it possible to write a fragment shader that puts results into more than one pixel on the output? Where can I find an example of that technique?
Is it possible to write a fragment shader that puts results into more
than one pixel on the output?
No. Shaders are designed to work individually. That is why they are so fast.
You should refactor your algorithm to be "shader friendly". Try to extract the inputs so they could feed the algorithm calculating a single value for a single fragment. Try to avoid branching and looping, otherwise it might be a good idea to keep the calculations on the CPU.
Assuming <do math> takes x and y as an input, these could be obtained from gl_FragCoord. And if <some result> is an output of <do math> your shader program could look something like this:
vec4 location = getLocation(gl_FragCoord);
gl_FragColor += do_math(location.x, location.y);
gl_FragColor += do_math(location.x-1, location.y);
gl_FragColor += do_math(location.x, location.y-1);
Note the subtraction instead of addition. In such way fragment is calculating it's own value completely instead of modifying the neighbours.

Converting RGB to grayscale/intensity

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.
It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.
However, I didn't find a good reference for this on the web. What is the source of these values?
See also these previous questions: here and here.
The specific numbers in the question are from CCIR 601 (see Wikipedia article).
If you convert RGB -> grayscale with slightly different numbers / different methods,
you won't see much difference at all on a normal computer screen
under normal lighting conditions -- try it.
Here are some more links on color in general:
Wikipedia Luma
Bruce Lindbloom 's outstanding web site
chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2;
this long link to Ware in books.google.com
may or may not work
cambridgeincolor :
excellent, well-written
"tutorials on how to acquire, interpret and process digital photographs
using a visually-oriented approach that emphasizes concept over procedure"
Should you run into "linear" vs "nonlinear" RGB,
here's part of an old note to myself on this.
Repeat, in practice you won't see much difference.
### RGB -> ^gamma -> Y -> L*
In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ),
are called "nonlinear" or
Gamma corrected.
"Linear" values are defined as
Rlin = R^gamma, Glin = G^gamma, Blin = B^gamma
where gamma is 2.2 for many PCs.
The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma))
(purists tongue-click) but here I'll drop the '.
Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma,
so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness.
(LCD displays are more complex;
furthermore, some graphics cards compensate for gamma.)
To get the measure of lightness called L* from RGB,
first divide R G B by 255, and compute
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
This is Y in XYZ color space; it is a measure of color "luminance".
(The real formulas are not exactly x^gamma, but close;
stick with x^gamma for a first pass.)
Finally,
L* = 116 * Y ^ 1/3 - 16
"... aspires to perceptual uniformity [and] closely matches human perception of lightness." --
Wikipedia Lab color space
I found this publication referenced in an answer to a previous similar question. It is very helpful, and the page has several sample images:
Perceptual Evaluation of Color-to-Grayscale Image Conversions by Martin Čadík, Computer Graphics Forum, Vol 27, 2008
The publication explores several other methods to generate grayscale images with different outcomes:
CIE Y
Color2Gray
Decolorize
Smith08
Rasche05
Bala04
Neumann07
Interestingly, it concludes that there is no universally best conversion method, as each performed better or worse than others depending on input.
Heres some code in c to convert rgb to grayscale.
The real weighting used for rgb to grayscale conversion is 0.3R+0.6G+0.11B.
these weights arent absolutely critical so you can play with them.
I have made them 0.25R+ 0.5G+0.25B. It produces a slightly darker image.
NOTE: The following code assumes xRGB 32bit pixel format
unsigned int *pntrBWImage=(unsigned int*)..data pointer..; //assumes 4*width*height bytes with 32 bits i.e. 4 bytes per pixel
unsigned int fourBytes;
unsigned char r,g,b;
for (int index=0;index<width*height;index++)
{
fourBytes=pntrBWImage[index];//caches 4 bytes at a time
r=(fourBytes>>16);
g=(fourBytes>>8);
b=fourBytes;
I_Out[index] = (r >>2)+ (g>>1) + (b>>2); //This runs in 0.00065s on my pc and produces slightly darker results
//I_Out[index]=((unsigned int)(r+g+b))/3; //This runs in 0.0011s on my pc and produces a pure average
}
Check out the Color FAQ for information on this. These values come from the standardization of RGB values that we use in our displays. Actually, according to the Color FAQ, the values you are using are outdated, as they are the values used for the original NTSC standard and not modern monitors.
What is the source of these values?
The "source" of the coefficients posted are the NTSC specifications which can be seen in Rec601 and Characteristics of Television.
The "ultimate source" are the CIE circa 1931 experiments on human color perception. The spectral response of human vision is not uniform. Experiments led to weighting of tristimulus values based on perception. Our L, M, and S cones1 are sensitive to the light wavelengths we identify as "Red", "Green", and "Blue" (respectively), which is where the tristimulus primary colors are derived.2
The linear light3 spectral weightings for sRGB (and Rec709) are:
Rlin * 0.2126 + Glin * 0.7152 + Blin * 0.0722 = Y
These are specific to the sRGB and Rec709 colorspaces, which are intended to represent computer monitors (sRGB) or HDTV monitors (Rec709), and are detailed in the ITU documents for Rec709 and also BT.2380-2 (10/2018)
FOOTNOTES
(1) Cones are the color detecting cells of the eye's retina.
(2) However, the chosen tristimulus wavelengths are NOT at the "peak" of each cone type - instead tristimulus values are chosen such that they stimulate on particular cone type substantially more than another, i.e. separation of stimulus.
(3) You need to linearize your sRGB values before applying the coefficients. I discuss this in another answer here.
Starting a list to enumerate how different software packages do it. Here is a good CVPR paper to read as well.
FreeImage
#define LUMA_REC709(r, g, b) (0.2126F * r + 0.7152F * g + 0.0722F * b)
#define GREY(r, g, b) (BYTE)(LUMA_REC709(r, g, b) + 0.5F)
OpenCV
nVidia Performance Primitives
Intel Performance Primitives
Matlab
nGray = 0.299F * R + 0.587F * G + 0.114F * B;
These values vary from person to person, especially for people who are colorblind.
is all this really necessary, human perception and CRT vs LCD will vary, but the R G B intensity does not, Why not L = (R + G + B)/3 and set the new RGB to L, L, L?

Resources