I want to optimize the fragment shader performance. Currently my fragment shader is
fragment half4 fragmen_shader_texture(VertexOutTexture vIn [[stage_in]],
texture2d<half> texture [[texture(0)]]){
constexpr sampler defaultSampler;
half4 color = half4(texture.sample(defaultSampler, vIn.textureCoordinates));
return color;
}
The task of this is to return the texture color. Anyway to optimize more than this.
No options for optimizing the fragment shader AFAICT, it's doing virtually nothing other than sampling the texture. However, depending on your situation, there still might be scope for optimization by:
Reducing bandwidth usage by using a more compact texture format (565 or 4444 instead of 8888, or better still 4-bit or 2-bit PVRTC).
Making sure that alpha blending is disabled if alpha blending is not required.
If the texture has lots of 'empty space' (e.g. think particle texture with a central circular blob and blank corners) then you could make sure the geometry fits it more tightly by rendering it as an Octagon rather than as a quad for instance.
Enable mipmapping if there's any possibility the image can be minimized. Disable more expensive mipmapping options like trilinear/anisotropic filtering.
Related
I'm trying to understand color within the context of a metal fragment (or compute) shader.
My understanding is that within the context of a metal shader any color values are always linear. Whatever texture is attached to fragment or compute function, metal will apply the inverse of any linear transfer function (gamma) on the way into the shader, and apply it again on the way out.
With this in mind, if within the context of a shader, I return a value with an approximate linear middle grey value of around 22.25%, when rendered to the screen using metal kit via a simple .bgra8Unorm texture, I would expect to get a non-linear sRGB reading of around 128,128,128.
fragment float4 fragment_shader(
TextureMappingVertex in [[stage_in]]
) {
float middleGrey = float(0.2225);
return float4(middleGrey, middleGrey, middleGrey, 1);
}
But in fact I get an output of 57,57,57 which is what I would expect if there were no conversion to and from the linear color space within the shader:
What am I missing here?
On the one hand, this certainly seems more intuitive, but it goes against what I thought were the rules for Metal shaders in that they are always in linear space.
The Metal Shading Language includes a lot of mathematic functions, but it seems most of the codes inside Metal official documentation just use it to map vertexes from pixel space to clip space like
RasterizerData out;
out.clipSpacePosition = vector_float4(0.0, 0.0, 0.0, 1.0);
float2 pixelSpacePosition = vertices[vertexID].position.xy;
vector_float2 viewportSize = vector_float2(*viewportSizePointer);
out.clipSpacePosition.xy = pixelSpacePosition / (viewportSize / 2.0);
out.color = vertices[vertexID].color;
return out;
Except for GPGPU using kernel functions to do parallel computation, what things that vertex function can do, with some examples? In a game, if all vertices positions are calculated by the CPU, why GPU still matters? What does vertex function do usually?
Vertex shaders compute properties for vertices. That's their point. In addition to vertex positions, they also calculate lighting normals at each vertex. And potentially texture coordinates. And various material properties used by lighting and shading routines. Then, in the fragment processing stage, those values are interpolated and sent to the fragment shader for each fragment.
In general, you don't modify vertices on the CPU. In a game, you'd usually load them from a file into main memory, put them into a buffer and send them to the GPU. Once they're on the GPU you pass them to the vertex shader on each frame along with model, view, and projection matrices. A single buffer containing the vertices of, say, a tree or a car's wheel might be used multiple times. Each time all the CPU sends is the model, view, and projection matrices. The model matrix is used in the vertex shader to reposition and scale the vertice's positions in world space. The view matrix then moves and rotates the world around so that the virtual camera is at the origin and facing the appropriate way. Then the projection matrix modifies the vertices to put them into clip space.
There are other things a vertex shader can do, too. You can pass in vertices that are in a grid in the x-y plane, for example. Then in your vertex shader you can sample a texture and use that to generate the z-value. This gives you a way to change the geometry using a height map.
On older hardware (and some lower-end mobile hardware) it was expensive to do calculations on a texture coordinate before using it to sample from a texture because you lose some cache coherency. For example, if you wanted to sample several pixels in a column, you might loop over them adding an offset to the current texture coordinate and then sampling with the result. One trick was to do the calculation on the texture coordinates in the vertex shader and have them automatically interpolated before being sent to the fragment shader, then doing a normal look-up in the fragment shader. (I don't think this is an optimization on modern hardware, but it was a big win on some older models.)
First, I'll address this statement
In a game, if all vertices positions are calculated by the CPU, why GPU still matters? What does vertex function do usually?
I don't believe I've seen anyone calculating positions for meshes that will be later used to render them on a GPU. It's slow, you would need to get all this data from CPU to a GPU (which means copying it through a bus if you have a dedicated GPU). And it's just not that flexible. There are much more things other than vertex positions that are required to produce any meaningful image and calculating all this stuff on CPU is just wasteful, since CPU doesn't care for this data for the most part.
The sole purpose of vertex shader is to provide rasterizer with primitives that are in clip space. But there are some other uses that are mostly tricks based on different GPU features.
For example, vertex shaders can write out some data to buffers, so, for example, you can stream out transformed geometry if you don't want to transform it again at a later vertex stage if you have multi-pass rendering that uses the same geometry in more than one pass.
You can also use vertex shaders to output just one triangle that covers the whole screen, so that fragment shaders gets called one time per pixel for the whole screen (but, honestly, you are better of using compute (kernel) shaders for this).
You can also write out data from vertex shader and not generate any primitives. You can do that by generating degenerate triangles. You can use this to generate bounding boxes. Using atomic operations you can update min/max positions and read them at a later stage. This is useful for light culling, frustum culling, tile-based processing and many other things.
But, and it's a BIG BUT, you can do most of this stuff in a compute shader without incurring GPU to run all the vertex assembly pipeline. That means, you can do full-screen effects using just a compute shader (instead of vertex and fragment shader and many pipeline stages in between, such as rasterizer, primitive culling, depth testing and output merging). You can calculate bounding boxes and do light culling or frustum culling in compute shader.
There are reasons to fire up the whole rendering pipeline instead of just running a compute shader, for example, if you will still use triangles that are output from vertex shader, or if you aren't sure how primitives are laid out in memory so you need vertex assembler to do the heavy lifting of assembling primitives. But, getting back to your point, almost all of the reasonable uses for vertex shader include outputting primitives in clip space. If you aren't using resulting primitives, it's probably best to stick to compute shaders.
I have multiple texture reads in my fragment shader, and I am supposedly doing bad things, like using the discard command and conditionals inside the shader.
The thing is, I am rendering to a texture and I want to reuse it in following passes with other shaders, that do not have to operate on pixels that were previously "discarded". This is for performance. The thing is, I need also to discard calculations if uniforms are out of certain ranges (which I read from another texture): imagine a loop with these shaders running always on the same textures, which are not cleared.
So what I have now, is a terrible performance. One idea I thought about is using glFragDepth together with the depth buffer and use that to fire depth testing in order to discard some pixels. But this does not work with the fact I want to have ranges.
Is there any alternative?
You could enable blending, and set the alpha values of pixels you don't want to render to zero. Setup:
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
Then in the fragment shader, where you previously called discard:
...
if (condition) {
discard;
}
...
Set the alpha to zero instead:
float alpha = float(condition);
...
gl_FragColor(r, g, b, alpha);
Whether this will perform better than discarding pixels could be very system dependent. But if you're looking for alternatives, it's worth trying.
I have some GUI-like elements in my WebGL scene which have to be displayed in plain white and not be affected by the lighting in scene.
The texture is a HTML5 canvas element, which has plain white background.
But when I render the canvas in the scene, as a texture, its affected by the lighting and becomes light gray.
How do I display this texture exactly with the same color of the canvas (plain white)?
I suspect the answer is in the fragment shader and playing already a while with it... and yes, I got once a plain color but nothing else, and I want to display some text over the background color.
Thanks!
I got it! :D
In the fragment shader I added a boolean variable:
uniform bool uNoLights;
And use only the texture pixel information, to determine the final color, when it's set:
if (uNoLights) {
gl_FragColor = texture2D(uSampler, vec2(vTextureCoord.s, vTextureCoord.t));
} else {
gl_FragColor = lights * texture2D(uSampler, vec2(vTextureCoord.s, vTextureCoord.t));
}
Another option is to use a separate program for your GUI and your scene.
What do you mean with program, different fragment shader / shaders?
A program is the combination of a vertex shader and a fragment shader. You must have a gl.useProgram somewhere in your code; I'm talking about setting up two different programs and doing gl.useProgram inside your rendering loop, before the scene and before the GUI.
The advantage of this is that you can completely separate the programs, and avoid having any logic that is only needed for the scene wasting time in the GUI, and vice versa.
(The disadvantage is that switching programs itself has a cost, which might be more than the cost of the conditional in this case, but it is quite normal to switch programs within a single frame.)
In the end you have to measure to see which gives better performance for your use case.
I'm trying to implement the technique described at : Compositing Images with Depth.
The idea is to use an existing texture (loaded from an image) as a depth mask, to basically fake 3D.
The problem I face is that glDrawPixels is not available in OpenglES. Is there a way to accomplish the same thing on the iPhone?
The depth buffer is more obscured than you think in OpenGL ES; not only is glDrawPixels absent but gl_FragDepth has been removed from GLSL. So you can't write a custom fragment shader to spool values to the depth buffer as you might push colours.
The most obvious solution is to pack your depth information into a texture and to use a custom fragment shader that does a depth comparison between the fragment it generates and one looked up from a texture you supply. Only if the generated fragment is closer is it allowed to proceed. The normal depth buffer will catch other cases of occlusion and — in principle — you could use a framebuffer object to create the depth texture in the first place, giving you a complete on-GPU round trip, though it isn't directly relevant to your problem.
Disadvantages are that drawing will cost you an extra texture unit and textures use integer components.
EDIT: for the purposes of keeping the example simple, suppose you were packing all of your depth information into the red channel of a texture. That'd give you a really low precision depth buffer, but just to keep things clear, you could write a quick fragment shader like:
void main()
{
// write a value to the depth map
gl_FragColor = vec4(gl_FragCoord.w, 0.0, 0.0, 1.0);
}
To store depth in the red channel. So you've partially recreated the old depth texture extension — you'll have an image that has a brighter red in pixels that are closer, a darker red in pixels that are further away. I think that in your question, you'd actually load this image from disk.
To then use the texture in a future fragment shader, you'd do something like:
uniform sampler2D depthMap;
void main()
{
// read a value from the depth map
lowp vec3 colourFromDepthMap = texture2D(depthMap, gl_FragCoord.xy);
// discard the current fragment if it is less close than the stored value
if(colourFromDepthMap.r > gl_FragCoord.w) discard;
... set gl_FragColor appropriately otherwise ...
}
EDIT2: you can see a much smarter mapping from depth to an RGBA value here. To tie in directly to that document, OES_depth_texture definitely isn't supported on the iPad or on the third generation iPhone. I've not run a complete test elsewhere.