GLSL ES precision errors and overflows - ios

I have the following fragment shader:
precision highp float;
varying highp vec2 vTexCoord;
uniform sampler2D uColorTexture;
void main () {
highp vec4 tmp;
tmp = ((texture2D (uColorTexture, vTexCoord) + texture2D (uColorTexture, vTexCoord)) / 2.0);
gl_FragColor = tmp;
}
I know this shader does not make much sense but it should still run correct and I try to reproduce a problem with it. When I analyze this shader with the Xcode OpenGL-ES analyzer it shows an error:
Overflow in implicit conversion, minimum range for lowp float is
(-2,2)
and it not only shows this error, also the rendering output is broken do to overflows. So it's not just a false positive from the analyzer it actually overflows.
Can anyone explain to me why dis produces an overflow although I chose highp everywhere?

You didn't really choose highp everywhere; from the GLSL ES spec chapter 8 (Built-in Functions):
Precision qualifiers for parameters and return values are not shown. For the texture functions, the precision of the return type matches the precision of the sampler type.
and from 4.5.3 (Default Precision Qualifiers):
The fragment language has the following predeclared globally scoped default precision statements: ... precision lowp sampler2D; ...
Which means that in your code texture2D (uColorTexture, vTexCoord) will return a lowp, you are adding two of them, potentially resulting in a value of 2.0.
From 4.5.2 (Precision Qualifiers):
The required minimum ranges and precisions for precision qualifiers are: ... lowp (−2,2) ...
The parentheses in (-2,2) indicate an open range, meaning it includes values up to (but not including) 2.
So I think the fact that you're adding two lowp's together means you're overflowing. Try changing that line to:
tmp = texture2D(uColorTexture, vTexCoord)/2.0 + texture2D(uColorTexture, vTexCoord)/2.0;

Related

Metal Shader vertex attributes - cannot convert attribute from MTLAttributeFormatUInt to float1

I have a shader that looks like this:
struct VertexIn {
float a_customIdx [[attribute(0)]];
...
};
vertex vec4 vertex_func(VertexIn v_in [[stage_in]], ...) {...}
In my buffer I'm actually passing in a uint32_t for a_customIdx, so in my MTLVertexAttributeDescriptor I specify its type to be MTLAttributeFormatUInt. When I create the RenderPipelineState I get the error:
cannot convert attribute from MTLAttributeFormatUInt to float1
I get the same error if I use MTLAttributeFormatInt, but can successfully convert a MTLAttributeFormatUShort.
Why is this not a valid operation? According to the documentation for format, "Casting any MTLVertexFormat to a float or half is valid".
I know there are multiple ways I can get around this problem, but I'm curious about why this is invalid - perhaps there's something about alignments and byte sizes I'm missing here.

Metal: unknown type name float4

I'm trying to include a header file in a metal shader.
For a prototype like this,
float4 someFunction(float4 v);
I get this error message,
Unknown type name 'float4'; did you mean 'float'?
It seems it doesn't understand it's a header for a shader program... Although other errors suggest it does. For instance, if I don't specify the address space here,
static float someK = 2.0;
I get this error,
Global variables must have a constant address space qualifier
which can be fixed if I add
constant static float someK = 2.0;
If I use references, I also get these type of errors,
Reference type must include device, threadgroup, constant, or thread address space qualifier
So it does look as if the compiler knows it's a shader. Why it doesn't know about float4? :(
Make sure you have the first two lines in your shader like in this example:
#include <metal_stdlib>
using namespace metal;
float4 someFunction(float4 v);
kernel void compute(texture2d<float, access::write> output [[texture(0)]],
uint2 gid [[thread_position_in_grid]])
{
float4 color = float4(0, 0.5, 0.5, 1);
output.write(color, gid);
}
This works fine for me.
Try using
vector_float4
instead.

Updating float4 declaration from dx9 to dx11

There's a shader which was given which I'm trying to update to be compatible with the newest Unity 5 (presumably dx11). I don't understand how float4 basic instantiation from dx9 was working. Can someone help me understand the following syntax and then provide an equivalent dx11 syntax?
I understand that float4 normally uses x,y,z,w or xyz,y as arguments, but what does did a single float argument do? Did float4(0.01) make {.01,0,0,0} or does float4(.01) make {.01,.01,.01,.01}?
Original code from the shader:
float4 Multiply19 = float4( 0.01 ) * float4( 0 );
It should make a new float4 with all members (xyzw) set to 0.01 and then multiply all that by 0, effectively making Multiply19 a (0, 0, 0, 0) float4.

GLSL - Change specific Color of Texture to another Color

My textures consist of 4 different colors. I want to change each color to a different color. I tried it the following way:
precision mediump float;
varying lowp vec4 vColor;
varying highp vec2 vUv;
uniform sampler2D texture;
bool inRange( float c1, float c2 ) {
return abs( c1 - c2 ) < 0.01;
}
void main() {
vec4 c = texture2D(texture, vUv);
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
else if ( inRange( c.r, 15.0/255.0 ) && inRange( c.g, 59.0/255.0 ) && inRange( c.b, 5.0/255.0 ) )
c = vec4( 65.0/255.0, 65.0/255.0, 65.0/255.0, 1.0 );
else if ( inRange( c.r, 157.0/255.0 ) && inRange( c.g, 184.0/255.0 ) && inRange( c.b, 55.0/255.0 ) )
c = vec4( 254.0/255.0, 247.0/255.0, 192.0/255.0, 1.0 );
else if ( inRange( c.r, 107.0/255.0 ) && inRange( c.g, 140.0/255.0 ) && inRange( c.b, 38.0/255.0 ) )
c = vec4( 226.0/255.0, 148.0/255.0, 148.0/255.0, 1.0 );
gl_FragColor = c;
}
This works. But it's terribly slow. I'm running this on an iPhone, but the calculations aren't that hard or am I missing something?
Is there a faster way to do this?
Branches are bad for shader performance. Normally, the GPU executes multiple fragment shaders (each for their own fragment) at once. They all run in lockstep -- SIMD processing means that in effect all parallel fragment processors are running the same code but operating on different data. When you have conditionals, it's possible for different fragments to be on different code paths, so you lose SIMD parallelism.
One of the best performance tricks for this sort of application is using a Color Lookup Table. You provide a 3D texture (the lookup table) and use the GLSL texture3D function to look up into it -- the input coordinates are the R, G, and B values of your original color, and the output is the replacement color.
This is very fast, even on mobile hardware -- the fragment shader doesn't have to do any computation, and the texture lookup is usually cached before the fragment shader even runs.
Constructing a lookup table texture is easy. Conceptually, it's cube that encodes every possible RGB value (x axis is R from 0.0 to 1.0, y axis is G, z axis is B). If you organize it as a 2D image, you can then open it in your favorite image editor and apply any color transformation filters you like to it. The filtered image is your conversion lookup table. There's a decent writeup on the technique here and another in GPU Gems 2. A more general discussion of the technique, applied using Core Image filters, is in Apple's documentation library.
EDIT: It was confirmed by the asker that it is the presence of any branches that causes the incredible slowdown. I will provide an attempt at a branchless solution.
Well, if branches (including using the ternary "?" operator) are unusable, you can only use arithmetic.
A possible solution (which is hideous from a maintenance perspective, but might fit your need) is to map your input color to output color using polynomials that give desired output for the 4 colors you care about. I treated the 3 RGB color channels separately and plugged in the input/output points into wolfram alpha with a cubic fit (example for the red channel here: http://www.wolframalpha.com/input/?i=cubic+fit+%7B238.0%2C+254.0%7D%2C%7B15.0%2C+65.0%7D%2C+%7B157.0%2C+254.0%7D%2C+%7B107.0%2C+226.0%7D). You could use any polynomial fit program for this purpose.
The code for the red channel is then:
float redResult = 20.6606 + 3.15457 * c.r - 0.0135167 * c.r*c.r + 0.0000184102 c.r*c.r*c.r
Rinse and repeat the process with the green and blue color channels and you have your shader. Note that you may want to specify the very small coefficients in scientific notation to retain accuracy... I don't know how your particular driver handles floating-point literals.
Even then you may (probably) have precision issues, but its worth a shot.
Another possibility is using an approximate Bump Function (I say approximate, since you don't actually care about the smoothness constraints). You just want a value thats 1 at the color you care about and 0 everywhere else far enough away. Say you have a three-component bump function: bump3 that takes a vec3 for the location of the bump and a vec3 for the location to evaluate the function at. Then you can rewrite one of your first conditional from:
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
to:
vec3 colorIn0 = vec3(238.0/255.0, 255.0/255.0, 84.0/255.0);
vec3 colorOut0 = vec3(254.0/255.0, 254.0/255.0, 247.0/255.0)
result.rgb = c.rgb + bump3(colorIn0, c.rgb)) * (colorOut0-colorIn0);
If max/min are fast on your hardware (they might be full branches under the hood :( ), a possible quick and dirty bump3() implementation might be:
float bump3(vec3 b, vec3 p) {
vec3 diff = abs(b-p);
return max(0.0, 1.0 - 255.0*(diff.x + diff.y + diff.z));
}
Other possibilities for bump3 might be abusing smoothstep (again, if is fast on your hardware) or using the exponential.
The polynomial approach has the added (incidental) benefit of generalizing your map to more than just the four colors, but requires many arithmetic operations, is a maintenance nightmare, and likely suffers from precision issues. The bump function approach, on the other hand, should produce the same results as your current shader, even on input that is not one of those four colors, and is much more readable and maintainable (adding another color pair is trivial, compared to the polynomial approach). However, in the implementation I gave, it uses a max, which might be a branch under the hood (I hope not, geez).
Original answer below
It would be good to know how you are getting timing information so we can be sure its this shader thats slow (you could test this by just making this a pass-through shader as a quick hack... I recommend getting used to using a profiler though). It seem exceedingly odd that such a straightforward shader is slow.
Otherwise, if your texture truly only has those 4 colors (and it is guaranteed), then you can trivially take the number of inRange calls down from 12 to 3 by removing the if from the last branch (just make it an else), and then only testing the r value of c. I don't know how the iPhone's glsl optimizer works, but then you could further try to replace the if statements with ternary operators and see if that makes a difference. Those are the only changes I can think of and unfortunately you can't do the definite optimization if your textures aren't guaranteed to only have those 4 colors.
I would again like to point out that you should make sure this shader is causing the slowdown before trying to optimize.

ios opengl 3.0 does not compile version

I am trying out opengl 3.0 in xcode 5
this is how I compile the shader
*shader = glCreateShader(type);
glShaderSource(*shader, 1, &source, NULL);
glCompileShader(*shader);
this is my shader
#version 140
attribute vec4 position;
attribute vec4 color;
varying vec4 colorVarying;
attribute vec2 TexCoordIn;
varying vec2 TexCoordOut;
out int rowIndex;
out int colIndex;
void main(void) {
colorVarying = color;
gl_Position = position;
TexCoordOut = TexCoordIn;
}
i try:
glGetString(GL_SHADING_LANGUAGE_VERSION);
returns 235, which is expected. but i get
ERROR: 0:1: '' : version '140' is not supported
from the compile log, I have tried many version numbers and only 100 worked. then i get
Invalid qualifiers 'out' in global variable context
what's wrong? I am running this on the iphone 4 64bit simulator, on my mac air with Intel HD Graphics 3000 384 MB graphics
I'm confused about what you're trying to do. You're saying you want to use OpenGL 3.0, yet you have tagged the question for OpenGL ES 3.0. Since you're talking about iPhone simulator, I'll assume you want to use OpenGL ES 3.0. You should note that on Apple mobile devices, only iPhone 5S has OpenGL ES 3.0-capable GPU. I don't know how well ES 3.0 works in the simulator - at least you should try the iPhone 5S simulator if there is such a thing available.
As for your shader, it looks like a merry mix of different language versions. First, for OpenGL ES 3.0 you need #version 300 es. The only other allowed version in OpenGL ES is #version 100 for ES 2.0. I don't know why you're trying #version 140 since it's only for (desktop) OpenGL 3.1. I don't know why you are expecting and getting 235 from GL_SHADING_LANGUAGE_VERSION? As per the spec, it should return OpenGL ES GLSL ES N.M ..., where N is major and M minor language version number.
Then, as noted by SurvivalMachine, you need to replace your varyings with out and attributes with in.

Resources