sRGB on iOS OpenGL ES 2.0 - ios

According to very few related topics that I could find I am gathering that the exponentiation step to obtain proper lighting computations perhaps must be done within the final fragment shader on an iOS app.
I have been profiling with the latest and greatest Xcode 5 OpenGL debugger and the exponentiation of the fragment accounts for a significant amount of computation. It was the line that took the longest time in the entire shader (the rest of the performance got sucked out by the various norm calls needed for point-lights).
glEnable(GL_FRAMEBUFFER_SRGB); unfortunately does not work as GL_FRAMEBUFFER_SRGB is not declared.
Of course the actual enum I should be using for GL ES may be different.
According to Apple:
The following extensions are supported for the SGX 543 and 554
processors only:
EXT_color_buffer_half_float
EXT_occlusion_query_boolean
EXT_pvrtc_sRGB
EXT_shadow_samplers
EXT_sRGB
EXT_texture_rg
OES_texture_half_float_linear
Well, that's nice, the newest device that does not have a 543 or 554 is the iPhone 4.
From the extension's text file it looks like I can set SRGB8_ALPHA8_EXT to the internalformat parameter of RenderbufferStorage, but nothing is said of how to get the normal final framebuffer to apply sRGB for us for free.
Now the sRGB correction seems like the missing step to get the correct colors. What I've been doing in my app to deal with the horrible "underexposed" colors is manually applying gamma correction like this in the fragment shader:
mediump float gammaf = 1.0/1.8; // this line declared outside of `main()`
// it specifies a constant 1.8 gamma
mediump vec4 gamma = vec4(gammaf, gammaf, gammaf, 1.0);
gl_FragColor = pow(color, gamma); // last line of `main()`
Now I recognize that the typical render pipeline involves one or more renders to a texture followed by a FS quad draw, which will afford me the opportunity to make use of an SRGB8_ALPHA_EXT renderbuffer, but what am I supposed to do without one? Am I SOL?
If that is the case, the pow call is sucking up so much time that it almost seems like I can squeeze some more perf out of it by building a 1D texture to sample and use as a gamma lookup table. This texture could then be used to tweak the output color intensities in custom ways (and get a much better approximation to sRGB compared to just the raw exponentiation). But it just all seems kind of wrong because supposedly sRGB is free.
Also somewhat alarming is the absence of any mention of the string srgb anywhere in the GL ES 2.0 spec. According to the makers of glm GL ES simply ignores sRGB entirely.
I know that I have used my code to render textures (I made a basic OpenGL powered image viewer that renders PVRTC textures) and they did not get "dimmed". I think what is happening there is that due to GL ES 2's lack of sRGB awareness, the textures are loaded in as being in linear space and written back out in the same way. In that situation, since no lighting gets applied (all colors got multiplied by 1.0) nothing bad happened to the results.

iOS 7.0 adds the new color format kEAGLColorFormatSRGBA8, which you can set instead of kEAGLColorFormatRGBA8 (the default value) for the kEAGLDrawablePropertyColorFormat key in the drawableProperties dictionary of a CAEAGLLayer. If you’re using GLKit to manage your main framebuffer for you, you can get GLKView to create a sRGB renderbuffer by setting its drawableColorFormat property to GLKViewDrawableColorFormatSRGBA8888.
Note that the OpenGL ES version of EXT_sRGB behaves as if GL_FRAMEBUFFER_SRGB is always enabled. If you want to render without sRGB conversion to/from the destination framebuffer, you’ll need to use a different attachment with a non-sRGB internal format.

I think you are getting confused between the EXT_sRGB and the ARB_framebuffer_sRGB extensions. The EXT_sRGB is the more recent extension, and is the one supported by iOS devices. This differs from ARB_framebuffer_sRGB in one important way, it is not necessary to call glEnable(GL_FRAMEBUFFER_SRGB) on the framebuffer to enable gamma correction, it is always enabled. All you need to do is create the framebuffer with an sRGB internal format and render linear textures to it.
This is not hugely useful on its own, as textures are rarely in a linear colour space. Fortunately the extension also includes the ability to convert sRGB textures to linear space. By uploading your textures with an internal format of sRGB8_ALPHA8_EXT, they will be converted into linear space when sampled in a shader for free. This allows you to use sRGB textures with a better perception encoded colour range, blend in higher precision linear space, then encode the result back to sRGB in the render buffer without any shader cost and accurate gamma correction.

Here are my test results. My only iOS7 device is an A7-powered iPad5, and in order to test fillrate I had to tweak my test app a bit to enable blending. That was sufficient on iOS 6.1 to prevent fragment-discarding optimizations on opaque geometry, but for iOS 7 I also needed to write gl_FragColor.w != 1.0 in the shader. Not a problem.
Using the GLKViewDrawableColorFormatSRGBA8888 does indeed appear to be free or close to free in terms of performance. I do not have a proper timedemo style benchmark setup so I am just testing "similar" scenes and the removal of the pow shaved around 2ms off the frame time (which would e.g. take 43ms to 41ms, 22 fps to 24 fps). Then, setting the sRGB framebuffer color format did not introduce a noticeable increase in the frame time as reported by the debugger, but this isn't very scientific and it certainly could have slowed it by a half a millisecond or so. I can't actually tell if it is completely free (i.e. fully utilizing a hardware path to perform final sRGB transformation) without first building more benching software, but I already have the problem solved so more rigorous testing will have to wait.

Related

WebGL: does using a single-channel texture format actually save (texture) memory?

From the Khronos ref pages:
GL_LUMINANCE: Each element is a single luminance value. The GL
converts it to floating point, then assembles it into an RGBA element
by replicating the luminance value three times for red, green, and
blue and attaching 1 for alpha.
Does this apply to WebGL too? If so, does this imply that using textures formatted with less channels such as LUMINANCE does not save VRAM compared to using RGBA?
And how about RAM?
Does this apply to WebGL too?
Yes it does
does this imply that using textures formatted with less channels such as LUMINANCE does not save VRAM compared to using RGBA
No, while this is implementation specific(and some implementations do actually choose to expand the data to RGBA before uploading) the expansion ought to happen on the fly, basically just providing one and the same value for every color component when sampling from such a texture within a shader.
And how about RAM?
Once you call texImage2D the data is uploaded to VRAM and not kept in RAM as long as you don't do so(e.g. by holding on to a reference of the data).
GL_LUMINANCE was deprecated and removed in OpenGL 3.2. Nowadays you specify the internal format explicitly with enums like GL_R8. Implementations allocate their internal storage with the specified format (even though this guarantee isn't there in the OpenGL spec, as per the as-if rule). I recommend not to use GL_LUMINANCE in WebGL either. Just use the explicit internal format and expand it in the shader as needed, or though texture swizzles.

What is the DirectX 11 equivalent of dev->SetRenderState(D3DRS_ALPHAREF, value);

I had a good search before starting here, this question:
How to set RenderState in DirectX11?
is far too general; in studying the first answer, I suspect I need the Blend State, but it's not obvious how to set up an alpha comparison.
And searching stack overflow for D3DRS_ALPHAREF produced only seven other questions: https://stackoverflow.com/search?q=D3DRS_ALPHAREF none of which are even remotely close.
I'm using this for a program that does a two pass render to transition from one image to a second. I have a control texture that is the same size as the textures I'm rendering, and is single channel luminance.
The last lines of my pixel shader are:
// Copy rgb from the source texture
out.color.rgb = source.color.rgb;
// copy alpha from the control texture.
out.color.a = control.color.r;
return out;
Then in my render setup I have:
DWORD const reference = static_cast<DWORD>(frameNum);
D3DCMPFUNC const compare = pass == 0 ? D3DCMP_GREATEREQUAL : D3DCMP_LESS;
m_pd3dDevice->SetRenderState(D3DRS_ALPHAREF, reference);
m_pd3dDevice->SetRenderState(D3DRS_ALPHAFUNC, compare);
Where frameNum is the current frame number of the transition: 0 through 255.
-- Edit -- For those not intimately familiar with this particular capability of DirectX 9, the final stage uses the compare function to compare the alpha output from the pixel shader with the reference value, and then it actually draws the pixel iff the comparison returns a true value.
The net result of all this is that the luminance level of the control texture controls how early or late each pixel changes in the transition.
So, how exactly do I do this with DirectX 11?
Yes, I realize there are other ways to achieve the same result, passing frameNum to a suitably crafted pixel shader could get me to the same place.
That's not the point here, I'm not looking for an alternative implementation, I am looking to learn how to do alpha comparisons in DirectX 11, since they have proven a useful tool from time to time in DirectX 9.
If you are moving from Direct3D 9 to Direct3D 11, it is useful to take a brief stop at what changed in Direct3D 10. This is covered in detail on MSDN. One of the points in that article is:
Removal of Fixed Function
It is sometimes surprising that even in a Direct3D 9 engine that fully exploits the programmable pipeline, there remains a number of areas that depend on the fixed-function (FF) pipeline. The most common areas are usually related to screen-space aligned rendering for UI. It is for this reason that you are likely to need to build a FF emulation shader or set of shaders which provide the necessary replacement behaviors.
This documentation contains a white paper containing replacement shader sources for the most common FF behaviors (see Fixed Function EMU Sample). Some fixed-function pixel behavior including alpha test has been moved into shaders.
IOW: You do this in a programmable shader in Direct3D 10 or later.
Take a look at DirectX Tool Kit and in particular the AlphaTestEffect (implemented in this cpp and shader file).

Most Efficient way of Multi-Texturing - iOS, OpenGL ES2, optimization

I'm trying to find the most efficient way of handling multi-texturing in OpenGL ES2 on iOS. By 'efficient' I mean the fastest rendering even on older iOS devices (iPhone 4 and up) - but also balancing convenience.
I've considered (and tried) several different methods. But have run into a couple of problems and questions.
Method 1 - My base and normal values are rgb with NO ALPHA. For these objects I don't need transparency. My emission and specular information are each only one channel. To reduce texture2D() calls I figured I could store the emission as the alpha channel of the base, and the specular as the alpha of the normal. With each being in their own file it would look like this:
My problem so far has been finding a file format that will support a full non-premultiplied alpha channel. PNG just hasn't worked for me. Every way that I've tried to save this as a PNG premultiplies the .alpha with the .rgb on file save (via photoshop) basically destroying the .rgb. Any pixel with a 0.0 alpha has a black rgb when I reload the file. I posted that question here with no activity.
I know this method would yield faster renders if I could work out a way to save and load this independent 4th channel. But so far I haven't been able to and had to move on.
Method 2 - When that didn't work I moved on to a single 4-way texture where each quadrant has a different map. This doesn't reduce texture2D() calls but it reduces the number of textures that are being accessed within the shader.
The 4-way texture does require that I modify the texture coordinates within the shader. For model flexibility I leave the texcoords as is in the model's structure and modify them in the shader like so:
v_fragmentTexCoord0 = a_vertexTexCoord0 * 0.5;
v_fragmentTexCoord1 = v_fragmentTexCoord0 + vec2(0.0, 0.5); // illumination frag is up half
v_fragmentTexCoord2 = v_fragmentTexCoord0 + vec2(0.5, 0.5); // shininess frag is up and over
v_fragmentTexCoord3 = v_fragmentTexCoord0 + vec2(0.5, 0.0); // normal frag is over half
To avoid dynamic texture lookups (Thanks Brad Larson) I moved these offsets to the vertex shader and keep them out of the fragment shader.
But my question here is: Does reducing the number of texture samplers used in a shader matter? Or would I be better off using 4 different smaller textures here?
The one problem I did have with this was bleed over between the different maps. A texcoord of 1.0 was was averaging in some of the blue normal pixels due to linear texture mapping. This added a blue edge on the object near the seam. To avoid it I had to change my UV mapping to not get too close to the edge. And that's a pain to do with very many objects.
Method 3 would be to combine methods 1 and 2. and have the base.rgb + emission.a on one side and normal.rgb + specular.a on the other. But again I still have this problem getting an independent alpha to save in a file.
Maybe I could save them as two files but combine them during loading before sending it over to openGL. I'll have to try that.
Method 4 Finally, In a 3d world if I have 20 different panel textures for walls, should these be individual files or all packed in a single texture atlas? I recently noticed that at some point minecraft moved from an atlas to individual textures - albeit they are 16x16 each.
With a single model and by modifying the texture coordinates (which I'm already doing in method 2 and 3 above), you can easily send an offset to the shader to select a particular map in an atlas:
v_fragmentTexCoord0 = u_texOffset + a_vertexTexCoord0 * u_texScale;
This offers a lot of flexibility and reduces the number of texture bindings. It's basically how I'm doing it in my game now. But IS IT faster to access a small portion of a larger texture and have the above math in the vertex shader? Or is it faster to repeatedly bind smaller textures over and over? Especially if you're not sorting objects by texture.
I know this is a lot. But the main question here is what's the most efficient method considering speed + convenience? Will method 4 be faster for multiple textures or would multiple rebinds be faster? Or is there some other way that I'm overlooking. I see all these 3d games with a lot of graphics and area coverage. How do they keep frame rates up, especially on older devices like the iphone4?
**** UPDATE ****
Since I've suddenly had 2 answers in the last few days I'll say this. Basically I did find the answer. Or AN answer. The question is which method is more efficient? Meaning which method will result in the best frame rates. I've tried the various methods above and on the iPhone 5 they're all just about as fast. The iPhone5/5S has an extremely fast gpu. Where it matters is on older devices like the iPhone4/4S, or on larger devices like a retina iPad. My tests were not scientific and I don't have ms speeds to report. But 4 texture2D() calls to 4 RGBA textures was actually just as fast or maybe even faster than 4 texture2d() calls to a single texture with offsets. And of course I do those offset calculations in the vertex shader and not the fragment shader (never in the fragment shader).
So maybe someday I'll do the tests and make a grid with some numbers to report. But I don't have time to do that right now and write a proper answer myself. And I can't really checkmark any other answer that isn't answering the question cause that's not how SO works.
But thanks to the people who have answered. And check out this other question of mine that also answered some of this one: Load an RGBA image from two jpegs on iOS - OpenGL ES 2.0
Have a post process step in your content pipeline where you merge your rgb with alpha texture and store it in a. Ktx file when you package the game or as a post build event when you compile.
It's fairly trivial format and would be simple to write such command-line tool that loads 2 png's and merges these into one Ktx, rgb + alpha.
Some benefits by doing that is
- less cpu overhead when loading the file at game start up, so the games starts quicker.
- Some GPUso does not natively support rgb 24bit format, which would force the driver to internally convert it to rgba 32bit. This adds more time to the loading stage and temporary memory usage.
Now when you got the data in a texture object, you do want to minimize texture sampling as it means alot of gpu operations and memory accesses depending on filtering mode.
I would recommend to have 2 textures with 2 layers each since there's issues if you do add all of them to the same one is potential artifacts when you sample with bilinear or mipmapped as it may include neighbour pixels close to edge where one texture layer ends and the second begins, or if you decided to have mipmaps generated.
As an extra improvement I would recommend not having raw rgba 32bit data in the Ktx, but actually compressing it into a dxt or pvrtc format. This would use much less memory which means faster loading times and less memory transfers for the gpu, as memory bandwidth is limited.
Of course, adding the compressor to the post process tool is slightly more complex.
Do note that compressed textures do loose a bit of the quality depending on algorithm and implementation.
Silly question but are you sure you are sampler limited? It just seems to me that, with your "two 2-way textures" you are potentially pulling in a lot of texture data, and you might instead be bandwidth limited.
What if you were to use 3 textures [ BaseRGB, NormalRBG, and combined Emission+Specular] and use PVRTC compression? Depending on the detail, you might even be able to use 2bpp (rather than 4bpp) for the BaseRGB and/or Emission+Specular.
For the Normals I'd probably stick to 4bpp. Further, if you can afford the shader instructions, only store the R&G channels (putting 0 in the blue channel) and re-derive the blue channel with a bit of maths. This should give better quality.

OpenGL photoshop overlay blend mode

Im trying to implement a particle system (using OpenGL 2.0 ES), where each particle is made up of a quad with a simple texture
the red pixels are transparent. Each particle will have a random alpha value from 50% to 100%
Now the tricky part is i like each particle to have a blendmode much like Photoshop "overlay" i tried many different combinations with the glBlendFunc() but without luck.
I dont understand how i could implement this in a fragment shader, since i need infomations about the current color of the fragment. So that i can calculate a new color based on the current and texture color.
I also thought about using a frame buffer object, but i guess i would need to re-render my frame-buffer-object into a texture, for each particle since each particle every frame, since i need the calculated fragment color when particles overlap each other.
Ive found math' and other infomations regrading the Overlay calculation but i have a hard time figuring out which direction i could go to implement this.
http://www.pegtop.net/delphi/articles/blendmodes/
Photoshop blending mode to OpenGL ES without shaders
Im hoping to have a effect like this:
You can get information about the current fragment color in the framebuffer on an iOS device. Programmable blending has been available through the EXT_shader_framebuffer_fetch extension since iOS 6.0 (on every device supported by that release). Just declare that extension in your fragment shader (by putting the directive #extension GL_EXT_shader_framebuffer_fetch : require at the top) and you'll get current fragment data in gl_LastFragData[0].
And then, yes, you can use that in the fragment shader to implement any blending mode you like, including all the Photoshop-style ones. Here's an example of a Difference blend:
// compute srcColor earlier in shader or get from varying
gl_FragColor = abs(srcColor - gl_LastFragData[0]);
You can also use this extension for effects that don't blend two colors. For example, you can convert an entire scene to grayscale -- render it normally, then draw a quad with a shader that reads the last fragment data and processes it:
mediump float luminance = dot(gl_LastFragData[0], vec4(0.30,0.59,0.11,0.0));
gl_FragColor = vec4(luminance, luminance, luminance, 1.0);
You can do all sorts of blending modes in GLSL without framebuffer fetch, but that requires rendering to multiple textures, then drawing a quad with a shader that blends the textures. Compared to framebuffer fetch, that's an extra draw call and a lot of schlepping pixels back and forth between shared and tile memory -- this method is a lot faster.
On top of that, there's no saying that framebuffer data has to be color... if you're using multiple render targets in OpenGL ES 3.0, you can read data from one and use it to compute data that you write to another. (Note that the extension works differently in GLSL 3.0, though. The above examples are GLSL 1.0, which you can still use in an ES3 context. See the spec for how to use framebuffer fetch in a #version 300 es shader.)
I suspect you want this configuration:
Source: GL_SRC_ALPHA
Destination: GL_ONE.
Equation: GL_ADD
If not, it might be helpful if you could explain the math of the filter you're hoping to get.
[EDIT: the answer below is true for OpenGL and OpenGL ES pretty much everywhere except iOS since 6.0. See rickster's answer for information about EXT_shader_framebuffer_fetch which, in ES 3.0 terms, allows a target buffer to be flagged as inout, and introduces a corresponding built-in variable under ES 2.0. iOS 6.0 is over a year old at the time of writing so there's no particular excuse for my ignorance; I've decided not to delete the answer because it's potentially valid to those finding this question based on its opengl-es, opengl-es-2.0 and shader tags.]
To confirm briefly:
the OpenGL blend modes are implemented in hardware and occur after the fragment shader has concluded;
you can't programmatically specify a blend mode;
you're right that the only workaround is to ping pong, swapping the target buffer and a source texture for each piece of geometry (so you draw from the first to the second, then back from the second to the first, etc).
Per Wikipedia and the link you provided, Photoshop's overlay mode is defined so that the output pixel from a background value of a and a foreground colour of b, f(a, b) is 2ab if a < 0.5 and 1 - 2(1 - a)(1 - b) otherwise.
So the blend mode changes per pixel depending on the colour already in the colour buffer. And each successive draw's decision depends on the state the colour buffer was left in by the previous.
So there's no way you can avoid writing that as a ping pong.
The closest you're going to get without all that expensive buffer swapping is probably, as Sorin suggests, to try to produce something similar using purely additive blending. You could juice that a little by adding a final ping-pong stage that converts all values from their linear scale to the S-curve that you'd see if you overlaid the same colour onto itself. That should give you the big variation where multiple circles overlap.

GPUImage : YUV or RGBA impact on performance?

I'm working on some still image processing, and GPUImage is a really awesome framework (thank you Brad Larson!).
I understand that :
some filters can be done with only 1 component. In this case, the image should be YUV (YCbCr), and we use only Y (luma = image grey level).
other filters need all color information from the 3 components - R,G and B.
YUV -> RGB conversion is provided (in GPUVideoCamera), RGB -> YUV may be hard-coded into the fragment shader (ex: GPUImageChromaKeyFilter)
I have many image-processing steps, some which can be based on YUV, others on RGB.
Basically, I want to mix RGB and YUV filters, so my general question is this :
What is the cost / information-loss of such successive conversions, and would you recommend any design ?
Thanks!
(PS : what is the problem with iPhone4 YUV->RGB conversion & AVCaptureStillImageOutput pixel-Format ?)
The use of YUV in GPUImage is a fairly new addition, and something I'm still experimenting with. I wanted to pull in YUV to try to improve filter performance, reduce memory usage, and possibly increase color fidelity. So far, my modifications have only achieved one of these three.
As you can see, I pull in YUV frames from the camera and then decide what to do with them at subsequent stages in the filter pipeline. If all of the filters that the camera input targets only want monochrome inputs, the camera input will send only the unprocessed Y channel texture on down the pipeline. If any of the filters need RGB input, the camera input will perform a shader-based conversion from YUV->RGB.
For filters that take in monochrome, this can lead to a significant performance boost with the elimination of the RGB conversion phase (done by AV Foundation when requesting BGRA data, or in my conversion shader), as well as a redundant conversion of RGB back to luminance. On an iPhone 4, the performance of the Sobel edge detection filter running on 720p frames goes from 36.0 ms per frame with RGB input to 15.1 ms using the direct Y channel. This also avoids a slight loss of information due to rounding from converting YUV to RGB and back to luminance. 8-bit color channels only have so much dynamic range.
Even when using RGB inputs, the movement of this conversion out of AV Foundation and into my shader leads to a performance win. On an iPhone 4S, running a saturation filter against 1080p inputs drops from 2.2 ms per frame to 1.5 ms per frame with my conversion shader instead of AV Foundation's built-in BGRA output.
Memory consumption is nearly identical between the two RGB approaches, so I'm experimenting with a way to improve this. For monochrome inputs, memory usage drops significantly due to the smaller texture size of the inputs.
Implementing an all-YUV pipeline is a little more challenging, because you would need to maintain parallel rendering pathways and shaders for the Y and UV planes, with separate input and output textures for both. Extracting planar YUV from RGB is tricky, because you'd need to somehow pull two outputs from one input, something that isn't natively supported in OpenGL ES. You'd need to do two render passes, which is fairly wasteful. Interleaved YUV444 might be more practical as a color format for a multistage pipeline, but I haven't played around with this yet.
Again, I'm just beginning to tinker with this.

Resources