WebGL shader save multiple 32 bit values - webgl

I need to save up to 8 32bit values in each WebGL fragment shader invocation(including when no OES_texture_float or OES_texture_half_float extentions are available). It seems I can only store a single 32 bit value by packing it into 4x8bits RGBA gl_FragColor.
Is there a way to store 8 values ?

The only way to draw more than one vec4 worth of data per call in the fragment shader is to use WEBGL_draw_buffers which lets you bind multiple color attachments to a framebuffer and then render to all of them in a single fragment shader call using
gl_FragData[constAttachmentIndex] = result;
If WEBGL_draw_buffers is not available the only workarounds are I can think of are
Rendering in multiple draw calls.
Call gl.drawArrays to render the first vec4, then again with different parameters or a different shader to render the second vec4.
Render based on gl_FragCoord where you change the output for each pixel.
In otherwords, 1st pixel gets the first vec4, second pixel gets the second vec4, etc. For example
float mode = mod(gl_Fragcoord.x, 2.);
gl_FragColor = mix(result1, result2, mode);
In this way the results are stored like this
1212121212
1212121212
1212121212
into one texture. For more vec4s you could do this
float mode = mod(gl_Fragcoord.x, 4.); // 4 vec4s
if (mode < 0.5) {
gl_FragColor = result1;
} else if (mode < 1.5) {
gl_FragColor = result2;
} else if (mode < 2.5) {
gl_FragColor = result3;
} else {
gl_FragColor = result4;
}
This may or may not be faster than method #1. Your shader is more complicated because it's potentially doing calculations for both result1 and result2 for every pixel but depending on the GPU and pipelining you might get some of that for free.
As for getting 32bit values out even if there's no OES_texture_float you're basically going to have to write out even more 8bit values using one of the 3 techniques above.
In WebGL2 draw buffers is a required a feature where as it's optional in WebGL1. In WebGL2 there's also transform feedback which writes the outputs of a vertex shader (the varyings) into buffers.

Related

Draw RGB pixel array to DirectX-11 render view

Given an array of RBG pixels that updates every frame (e.g. 1024x1024), a ID3D11RenderTargetView, ID3D11Device and ID3D11DeviceContext, what's the easiest way to draw these pixels to the render view?
I've been working the angle of creating a vertex buffer for a square (two triangles), trying to make pixels be a proper texture, and figuring out how to make a shader reference the texture sampler. I've been following this tutorial https://learn.microsoft.com/en-us/windows/uwp/gaming/applying-textures-to-primitives .... But to be honest, I don't see how this tutorial has shaders that even reference the texture data (shaders defined on the proceeding tutorial, here).
I am a total DirectX novice, but I am writing a plugin for an application where I am given a directx11 device/view/context, and need to fill it with my pixel data. Many thanks!
IF you can make sure your staging resource matches the exact resolution and format of the render target you are given:
Create a staging resource
Map the staging resource, and copy your data into it.
Unmap the staging resource
UseGetResource on the RTV to get the resource
CopyResource from your staging to that resource.
Otherwise, IF you can count on Direct3D Hardware Feature level 10.0 or better, the easiest way would be:
Create a texture with USAGE_DYNAMIC.
Map it and copy your data into the texture.
Unmap the resource
Render the dynamic texture as a 'full-screen' quad using the 'big-triangle' self-generation trick in the vertex shader:
SamplerState PointSampler : register(s0);
Texture2D<float4> Texture : register(t0);
struct Interpolators
{
float4 Position : SV_Position;
float2 TexCoord : TEXCOORD0;
};
Interpolators main(uint vI : SV_VertexId)
{
Interpolators output;
// We use the 'big triangle' optimization so you only Draw 3 verticies instead of 4.
float2 texcoord = float2((vI << 1) & 2, vI & 2);
output.TexCoord = texcoord;
output.Position = float4(texcoord.x * 2 - 1, -texcoord.y * 2 + 1, 0, 1);
return output;
}
and a pixel shader of:
float4 main(Interpolators In) : SV_Target0
{
return Texture.Sample(PointSampler, In.TexCoord);
}
Then draw with:
ID3D11ShaderResourceView* textures[1] = { texture };
context->PSSetShaderResources(0, 1, textures);
// You need a sampler object.
context->PSSetSamplers(0, 1, &sampler);
// Depending on your desired result, you may need state objects here
context->OMSetBlendState(nullptr, nullptr, 0xffffffff);
context->OMSetDepthStencilState(nullptr, 0);
context->RSSetState(nullptr);
context->IASetInputLayout(nullptr);
contet->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
Draw(3, 0);
For full source for the "Full Screen Quad" drawing, see GitHub.

Optimizing branching for lookup tables

Branching in WebGL seems to be something like the following (paraphrased from various articles):
The shader executes its code in parallel, and if it needs to evaluate whether a condition is true before continuing (e.g. with an if statement) then it must diverge and somehow communicate with the other threads in order to come to a conclusion.
Maybe that's a bit off - but ultimately, it seems like the problem with branching in shaders is when each thread may be seeing different data. Therefore, branching with uniforms-only is typically okay, whereas branching on dynamic data is not.
Question 1: Is this correct?
Question 2: How does this relate to something that's fairly predictable but not a uniform, such as an index in a loop?
Specifically, I have the following function:
vec4 getMorph(int morphIndex) {
/* doesn't work - can't access morphs via dynamic index
vec4 morphs[8];
morphs[0] = a_Morph_0;
morphs[1] = a_Morph_1;
...
morphs[7] = a_Morph_7;
return morphs[morphIndex];
*/
//need to do this:
if(morphIndex == 0) {
return a_Morph_0;
} else if(morphIndex == 1) {
return a_Morph_1;
}
...
else if(morphIndex == 7) {
return a_Morph_7;
}
}
And I call it in something like this:
for(int i = 0; i < 8; i++) {
pos += weight * getMorph(i);
normal += weight * getMorph(i);
...
}
Technically, it works fine - but my concern is all the if/else branches based on the dynamic index. Is that going to slow things down in a case like this?
For the sake of comparison, though it's tricky to explain in a few concise words here - I have an alternative idea to always run all the calculations for each attribute. This would involve potentially 24 superfluous vec4 += float * vec4 calculations per vertex. Would that be better or worse than branching 8 times on an index, usually?
note: in my actual code there's a few more levels of mapping and indirection, while it does boil down to the same getMorph(i) question, my use case involves getting that index from both an index in a loop, and a lookup of that index in a uniform integer array
I know this is not a direct answer to your question but ... why not just not use a loop?
vec3 pos = weight[0] * a_Morph_0 +
weight[1] * a_Morph_1 +
weight[2] * a_Morph_2 ...
If you want generic code (ie where you can set the number of morphs) then either get creative with #if, #else, #endif
const numMorphs = ?
const shaderSource = `
...
#define NUM_MORPHS ${numMorphs}
vec3 pos = weight[0] * a_Morph_0
#if NUM_MORPHS >= 1
+ weight[1] * a_Morph_1
#endif
#if NUM_MORPHS >= 2
+ weight[2] * a_Morph_2
#endif
;
...
`;
or generate the shader in JavaScript with string manipulation.
function createMorphShaderSource(numMorphs) {
const morphStrs = [];
for (i = 1; i < numMorphs; ++i) {
morphStrs.push(`+ weight[${i}] * a_Morph_${i}`);
}
return `
..shader code..
${morphStrs.join('\n')}
..shader code..
`;
}
Shader generation through string manipulation is a normal thing to do. You'll find all major 3d libraries do this (three.js, unreal, unity, pixi.js, playcanvas, etc...)
As for whether or not branching is slow it really depends on the GPU but the general rule is that yes, it's slower no matter how it's done.
You generally can avoid branches by writing custom shaders instead of trying to be generic.
Instead of
uniform bool haveTexture;
if (haveTexture) {
...
} else {
...
}
Just write 2 shaders. One with a texture and one without.
Another way to avoid branches is to get creative with your math. For example let's say we want to support vertex colors or textures
varying vec4 vertexColor;
uniform sampler2D textureColor;
...
vec4 tcolor = texture2D(textureColor, ...);
gl_FragColor = tcolor * vertexColor;
Now when we just want just a vertex color set textureColor to a 1x1 pixel white texture. When we just want just a texture turn off the attribute for vertexColor and set that attribute to white gl.vertexAttrib4f(vertexColorAttributeLocation, 1, 1, 1, 1); and bonus!, we can modulate the texture with vertexColors by supplying both a texture and vertex colors.
Similarly we could pass in a 0 or a 1 to multiply certain things by 0 or 1 to remove their influence. In your morph example, a 3d engine that is targeting performance would generate shaders for different numbers of morphs. A 3d engine that didn't care about performance would have 1 shader that supported N morph targets just set the weight to 0 for any unused targets to 0.
Yet another way to avoid branching is the step function which is defined as
step(edge, x) {
return x < edge ? 0.0 : 1.0;
}
So you can choose a or b with
v = mix(a, b, step(edge, x));

Fragment shader output interferes with conditional statement

Context: I'm doing all of the following using OpenGLES 2 on iOS 11
While implementing different blend modes used to blend two textures together I came across a weird issue that I managed to reduce to the following:
I'm trying to blend the following two textures together, only using the fragment shader and not the OpenGL blend functions or equations. GL_BLEND is disabled.
Bottom - dst:
Top - src:
(The bottom image is the same as the top image but rotated and blended onto an opaque white background using "normal" (as in Photoshop 'normal') blending)
In order to do the blending I use the
#extension GL_EXT_shader_framebuffer_fetch
extension, so that in my fragment shader I can write:
void main()
{
highp vec4 dstColor = gl_LastFragData[0];
highp vec4 srcColor = texture2D(textureUnit, textureCoordinateInterpolated);
gl_FragColor = blend(srcColor, dstColor);
}
The blend doesn't perform any blending itself. It only chooses the correct blend function to apply based on a uniform blendMode integer value. In this case the first texture gets drawn with an already tested normal blending function and then the second texture gets drawn on top with the following blendTest function:
Now here's where the problem comes in:
highp vec4 blendTest(highp vec4 srcColor, highp vec4 dstColor) {
highp float threshold = 0.7; // arbitrary
highp float g = 0.0;
if (dstColor.r > threshold && srcColor.r > threshold) {
g = 1.0;
}
//return vec4(0.6, g, 0.0, 1.0); // no yellow lines (Case 1)
return vec4(0.8, g, 0.0, 1.0); // shows yellow lines (Case 2)
}
This is the output I would expect (made in Photoshop):
So red everywhere and green/yellow in the areas where both textures contain an amount of red that is larger than the arbitrary threshold.
However, the results I get are for some reason dependent on the output value I choose for the red component (0.6 or 0.8) and none of these outputs matches the expected one.
Here's what I see (The grey border is just the background):
Case 1:
Case 2:
So to summarize: If I return a red value that is larger than the threshold, e.g
return vec4(0.8, g, 0.0, 1.0);
I see vertical yellow lines, whereas if the red component is less than the threshold there will be no yellow/green in the result whatsoever.
Question:
Why does the output of my fragment shader determine whether or not the conditional statement is executed and even then, why do I end up with green vertical lines instead of green boxes (which indicates that the dstColor is not being read properly)?
Does it have to do with the extension that I'm using?
I also want to point out that the textures are both being loaded in and bound properly. I can see them just fine if I just return the individual texture info without blending or even with a normal blending function that I've implemented everything works as expected.
I found out what the problem was (and I realize that it's not something anyone could have known from just reading the question):
There is an additional fully transparent texture being drawn between the two textures you can see above, which I had forgotten about.
Instead of accounting for that and just returning the dstColor in case the srcColor alpha is 0, the transparent texture's color information (which is (0.0, 0.0, 0.0, 0.0)) was being used when blending, therefore altering the framebuffer content.
Both the transparent texture and the final texture were drawn with the blendTest function, so the output of the first function call was then being read in when blending the final texture.

How to get the texture size in HLSL?

For a HLSL shader I'm working on (for practice) I'm trying to execute a part of the code if the texture coordinates (on a model) are above half the respective size (that is x > width / 2 or y > height / 2). I'm familiar with C/C++ and know the basics of HLSL (the very basics). If no other solution is possible, I will set the texture size manually with XNA (in which I'm using the shader, as a matter of fact). Is there a better solution? I'm trying to remain within Shader Model 2.0 if possible.
The default texture coordinate space is normalized to 0..1 so x > width / 2 should simply be texcoord.x > 0.5.
Be careful here. tex2d() and other texture calls should NOT be within if()/else clauses. So if you have a pixel shader input "IN.UV" and your aiming at "OUT.color," you need to do it this way:
float4 aboveCol = tex2d(mySampler,some_texcoords);
float4 belowCol = tex2d(mySampler,some_other_texcoords);
if (UV.x >= 0.5) {
OUT.color = /* some function of... */ aboveCol;
} else {
OUT.color = /* some function of... */ belowCol;
}
rather than putting teh tex() calls inside the if() blocks.

Efficient way to render a bunch of layered textures?

What's the efficient way to render a bunch of layered textures? I have some semitransparent textured rectangles that I position randomly in 3D space and render them from back to front.
Currently I call d3dContext->PSSetShaderResources() to feed the pixel shader with a new texture before each call to d3dContext->DrawIndexed(). I have a feeling that I am copying the texture to the GPU memory before each draw. I might have 10-30 ARGB textures roughly 1024x1024 pixels each and they are associated across 100-200 rectangles that I render on screen. My FPS is OK at 100, but goes pretty bad around 200. I possibly have some inefficiencies elsewhere since this is my first semi-serious D3D code, but I strongly suspect this has to do with copying the textures back and forth. 30*1024*1024*4 is 120MB, which is a bit high for a Metro Style App that should target any Windows 8 device. So putting them all in there might be a stretch, but maybe I could at least cache a few somehow? Any ideas?
*EDIT - Some code snippets added
Constant Buffer
struct ModelViewProjectionConstantBuffer
{
DirectX::XMMATRIX model;
DirectX::XMMATRIX view;
DirectX::XMMATRIX projection;
float opacity;
float3 highlight;
float3 shadow;
float textureTransitionAmount;
};
The Render Method
void RectangleRenderer::Render()
{
// Clear background and depth stencil
const float backgroundColorRGBA[] = { 0.35f, 0.35f, 0.85f, 1.000f };
m_d3dContext->ClearRenderTargetView(
m_renderTargetView.Get(),
backgroundColorRGBA
);
m_d3dContext->ClearDepthStencilView(
m_depthStencilView.Get(),
D3D11_CLEAR_DEPTH,
1.0f,
0
);
// Don't draw anything else until all textures are loaded
if (!m_loadingComplete)
return;
m_d3dContext->OMSetRenderTargets(
1,
m_renderTargetView.GetAddressOf(),
m_depthStencilView.Get()
);
UINT stride = sizeof(BasicVertex);
UINT offset = 0;
// The vertext buffer only has 4 vertices of a rectangle
m_d3dContext->IASetVertexBuffers(
0,
1,
m_vertexBuffer.GetAddressOf(),
&stride,
&offset
);
// The index buffer only has 4 vertices
m_d3dContext->IASetIndexBuffer(
m_indexBuffer.Get(),
DXGI_FORMAT_R16_UINT,
0
);
m_d3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
m_d3dContext->IASetInputLayout(m_inputLayout.Get());
FLOAT blendFactors[4] = { 0, };
m_d3dContext->OMSetBlendState(m_blendState.Get(), blendFactors, 0xffffffff);
m_d3dContext->VSSetShader(
m_vertexShader.Get(),
nullptr,
0
);
m_d3dContext->PSSetShader(
m_pixelShader.Get(),
nullptr,
0
);
m_d3dContext->PSSetSamplers(
0, // starting at the first sampler slot
1, // set one sampler binding
m_sampler.GetAddressOf()
);
// number of rectangles is in the 100-200 range
for (int i = 0; i < m_rectangles.size(); i++)
{
// start rendering from the farthest rectangle
int j = (i + m_farthestRectangle) % m_rectangles.size();
m_vsConstantBufferData.model = m_rectangles[j].transform;
m_vsConstantBufferData.opacity = m_rectangles[j].Opacity;
m_vsConstantBufferData.highlight = m_rectangles[j].Highlight;
m_vsConstantBufferData.shadow = m_rectangles[j].Shadow;
m_vsConstantBufferData.textureTransitionAmount = m_rectangles[j].textureTransitionAmount;
m_d3dContext->UpdateSubresource(
m_vsConstantBuffer.Get(),
0,
NULL,
&m_vsConstantBufferData,
0,
0
);
m_d3dContext->VSSetConstantBuffers(
0,
1,
m_vsConstantBuffer.GetAddressOf()
);
m_d3dContext->PSSetConstantBuffers(
0,
1,
m_vsConstantBuffer.GetAddressOf()
);
auto a = m_rectangles[j].textureId;
auto b = m_rectangles[j].targetTextureId;
auto srv1 = m_textures[m_rectangles[j].textureId].textureSRV.GetAddressOf();
auto srv2 = m_textures[m_rectangles[j].targetTextureId].textureSRV.GetAddressOf();
ID3D11ShaderResourceView* srvs[2];
srvs[0] = *srv1;
srvs[1] = *srv2;
m_d3dContext->PSSetShaderResources(
0, // starting at the first shader resource slot
2, // set one shader resource binding
srvs
);
m_d3dContext->DrawIndexed(
m_indexCount,
0,
0
);
}
}
Pixel Shader
cbuffer ModelViewProjectionConstantBuffer : register(b0)
{
matrix model;
matrix view;
matrix projection;
float opacity;
float3 highlight;
float3 shadow;
float textureTransitionAmount;
};
Texture2D baseTexture : register(t0);
Texture2D targetTexture : register(t1);
SamplerState simpleSampler : register(s0);
struct PixelShaderInput
{
float4 pos : SV_POSITION;
float3 norm : NORMAL;
float2 tex : TEXCOORD0;
};
float4 main(PixelShaderInput input) : SV_TARGET
{
float3 lightDirection = normalize(float3(0, 0, -1));
float4 baseTexelColor = baseTexture.Sample(simpleSampler, input.tex);
float4 targetTexelColor = targetTexture.Sample(simpleSampler, input.tex);
float4 texelColor = lerp(baseTexelColor, targetTexelColor, textureTransitionAmount);
float4 shadedColor;
shadedColor.rgb = lerp(shadow.rgb, highlight.rgb, texelColor.r);
shadedColor.a = texelColor.a * opacity;
return shadedColor;
}
As Jeremiah has suggested, you are not probably moving texture from CPU to GPU for each frame as you would have to create new texture for each frame or using "UpdateSubresource" or "Map/UnMap" methods.
I don't think that instancing is going to help for this specific case, as the number of polygons is extremely low (I would start to worry with several millions of polygons). It is more likely that your application is going to be bandwidth/fillrate limited, as your are performing lots of texture sampling/blending (It depends on tecture fillrate, pixel fillrate and the nunber of ROP on your GPU).
In order to achieve better performance, It is highly recommended to:
Make sure that all your textures have all mipmaps generated. If they
don't have any mipmaps, It will hurt a lot the cache of the GPU. (I also assume that you are using texture.Sample method in HLSL, and not texture.SampleLevel or variants)
Use Direct3D11 Block Compressed texture on the GPU, by using a tool
like texconv.exe or preferably the sample from "Windows DirectX 11
Texture Converter".
On a side note, you will probably get more attention for this kind of question on https://gamedev.stackexchange.com/.
I don't think you are doing any copying back and forth from GPU to system memory. You usually have to explicitly do that a call to Map(...), or by blitting to a texture you created in system memory.
One issue, is you are making a DrawIndexed(...) call for each texture. GPUs work most efficiently if you send it a bunch of work to do by batching. One way to accomplish this is to set n-amount of textures to PSSetShaderResources(i, ...), and do a DrawIndexedInstanced(...). Your shader code would then read each of the shader resources and draw them. I do this in my C++ DirectCanvas code here (SpriteInstanced.cpp). This can make for a lot of code, but the result is very efficient (I even do the matrix ops in the shader for more speed).
One other, maybe a lot easier way, is to give the DirectXTK spritebatch a shot.
I used it here in this project...only for a simple blit but it may be a good start to see the minor amount of setup needed to use the spritebatch.
Also, if possible, try to "atlas" your texture. For instance, try to fit as many "images" in a texture as possible and blit from them vs having a single texture for each.

Resources