XNA - vertex streams? - xna

Would someone provide/point me to an explanation of or tutorial on using multiple vertex streams in HLSL and XNA? I'm interested in how they're stored/accessed by the GPU, advantages of or uses for streams in custom shaders, etc.
I've seen a few examples on using multiple vertex streams for instanced geometry, but I'm having a hard time wrapping my head around the underlying mechanism.
Update
If I have a vertex shader which accepts two parameters (borrowed from this tutorial)
InstancingVSoutput InstancingVS(InstancingVSinput input, float4x4 instanceTransform : TEXCOORD0, float4 color : TEXCOORD4)
{
InstancingVSoutput output;
float4 pos = input.Position;
pos = mul(pos, transpose(instanceTransform));
pos = mul(pos, WVP);
output.Position = pos;
output.Color = color;
return output;
}
It seems from the example that I pulled this from that instanceTransform and input are pulled from separate streams. However, in this case, the input stream is a list of six vertices, and the instanceTransform comes from a stream of a much larger number of elements, consisting of translation matrices. This is supposed to be used for instanced geometry.
I'm confused about how many times this shader gets executed - is it VertexBuffer0.VertexCount*VertexBuffer1.VertexCount? The problem with this kind of thing is that, once someone's figured it out, they don't bother contributing a well-written document back to the community detailing their discovery.
Argh.

Since no one else has chimed in yet, I'll give it a go :-) this is a great thread on the apphub forums about Vertex Streams:
http://forums.create.msdn.com/forums/p/46229/276901.aspx
from one of the answers:
The gist is this: different streams
can have different data layouts, and
your VertexDeclaration determines what
data gets pulled from what stream.
So, for instance, you could have one
buffer that stores all your positions
and one buffer which stores all your
colors, and you could set those to
different streams; alternatively you
could munge them into a single stream,
but this isn't always convenient.
Hope it helps ;-)

Related

Using Metal discard_fragment() to discard individual samples in an MSAA attachment

For an MSSA attachment, the following simple Metal fragment shader is meant to be run in multiple render passes, once per sample, to fill the stencil attachment with different reference values per sample. It does not work as expected, and effectively fills all stencil pixel samples with the reference value on each renderpass.
struct _10
{
int _m0;
};
struct main0_out
{
float gl_FragDepth [[depth(any)]];
};
fragment main0_out main0(constant _10& _12 [[buffer(0)]], uint gl_SampleID [[sample_id]])
{
main0_out out = {};
if (gl_SampleID != _12._m0)
{
discard_fragment();
}
out.gl_FragDepth = 0.5;
return out;
}
The problem seems to be using discard_fragment() on a per-sample basis. The intended operation of discarding one sample but writing another does not occur. Instead, the sample is never discarded, regardless of the comparison value passed in the buffer.
In fact, from what I can tell from GPU capture shader tracing results, it appears that the entire if-discard clause is optimized away by the Metal compiler. My guess is that Metal probably recognizes the disconnect between per-sample invocations and discard_fragment(), and removes it, but I can't be sure.
I can't find any Metal documentation on discard_fragment() and its use with MSAA, so I can't tell whether discard_fragment() is supposed to work with individual sample invocations in that environment, or whether it can only discard the entire fragment (which admittedly the function name implies, but what does that mean for per-sample invocations?).
Does the logic and intention of this shader make sense? Is discard_fragment() supposed to work with individual sample invocations? And why would the Metal compiler possibly be removing the discard operation from my shader?

Vulkan texture rendering on multiple meshes

I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?

iOS Import .obj file to Model I/O without duplicating vertices

I'm trying to import a .obj file to use in Scene Kit using the Model I/O framework. I initially used the simple MDLAsset initWithURL: function, but after transferring the mesh to a SCNGeometry, I realized this function was triangulizing the mesh, such that each face had 3 unique vertices, and there were separate vertices at the same location for border faces. This was causing some major problems with my other functions, so I tried to fix it by instead using the MDLAsset initWithURL:vertexDescriptor:bufferAllocator:preserveTopology function with preserveTopology set to YES with the descriptor/allocator set to the default with nil. This preserving topology fixed my problem of duplicating vertices, so the faces/edges were all good, but in the process I lost the normals data.
By lost the normals, I don't mean multiple indexing, I mean after setting preserveTopology to YES, the buffer did not contain any normals values at all. Whereas before it was v1/n1/v2/n2... and the stride was 24 bytes (3 dimensions *4 bytes/float * 2 attributes), now the first half of the buffer is v1/v2/... with a stride of 12 and the entire 2nd half of the buffer is just 0.0 floats.
Also something weird with this, when you look at the SCNGeometrySources of the Geometry, there are 2 sources, 1 with semantic kGeometrySourceSemanticVertex, and 1 with semantic kGeometrySourceSemanticNormal. You would think that the semantic vertex source would contain the position data, and the semantic normal source would contain the normal data. However that is not the case. No matter what you set preserveTopology, they are buffers of size to contain both position and normal data with identical values. So when I said before there was no normal data, I mean both of these buffers, semantic vertex AND semantic normal went from being v1/n1/v2/n2... to v1/v2/.../(0.0, 0.0, 0.0)/(0.0, 0.0, 0.0)/... I went into the mdlmesh's buffer (before the transfer to scene kit) at found the same problem, so the problem must be with the initWithURL, not with the model i/o to scenekit bridge.
So I figured there must be something wrong with the default vertex descriptor and buffer allocator (since I was using nil) and went about trying to create my own that matched these 2 possible data formats. Alas after much trying I was unable to get something that worked.
Any ideas on how I should do this? How to give MDLAsset the proper vertexDescriptor and bufferAllocator (I feel like nil should be ok here) for importing a .obj file? Thanks
An obj file with vertices and normals has vertices, indicated by v lines, normals, indicated by vn lines, and faces, indicated by f lines.
The v and vn lines will just be the floating point values you expect, and the f line will be of the form -
f v0//n0 v1//n1 etc
Since OpenGL and Metal don't allow multiple indexing, you'll see the first effect of vertices being duplicated. For example,
f 0//0 1//2 2//0
can't work as a vertex buffer because it would require different indices per vertex. So typical OBJ parsers have to create new vertices that allow the face to become
f 0//0 1//1 2//2
The preserve topology option doesn't help you. It preserves the connectivity and shape of the mesh (no triangulation occurs, shared edges remain shared) but it still enforces a single index per vertex component.
One solution would be to make sure that your tool that is outputting the OBJ files uses single indexing during export, if that is an option.
Another option, and this won't solve the problem immediately, would be file a request that multiple-indexing be supported at the Model I/O level. SceneKit would still have to uniquely-index because it is has to be able to render.
Another option would be to use a format like PLY that doesn't have multiple indexing.

best approach to constructing an OpenGL ES 2.0 shader-based dynamic chain of filters

I'm on iOS 6 (7 too if you will and makes any difference) and GL ES 2.0.
The idea is for a CAEAGLLayer to have a dynamic chain of shader-based filters that processes its contents property and displays the final result. Filters can be added / removed at any point in the chain.
So far I came up with an implementation, but I'm wondering if there's better ways to go about it. My implementation roughly goes about it this way:
A base filter class from which concrete filters inherit, creating a shader program (vertex / fragment combo) for whatever filter / imaging they implement.
A CAEAGLLayer subclass which implements the filter chain and to which filters are added. The high-level processing algorithm is:
// 1 - Assume whenever the layer's content property is changed to an image, a copy of the image gets stored in a sourceImage property.
// 2 - Assume changing the content property or adding / removing an image unit triggers this algorithm.
// 3 - Assume the whole filter chain basically processes a quad with position and texture coordinates thru a VBO.
// 4 - Assume all shader programs (by shader program I mean a vertex and fragment shader pair in a single program) have access to texture unit 0.
// 5 - Assume P shader programs.
load imageSource into a texture object bound to GL_TEXTURE2D and pointing to to GL_TEXTURE0
attach bound texture object to GL_FRAMEBUFFER GL_COLOR_ATTACHMENT0 (so we are doing render-to-texture, which will be accessible to fragment shaders)
for p = program identifier 0 up to P - 2:
glUseProgram(p)
glDrawArrays()
attach GL_RENDERBUFFER to GL_FRAMEBUFFER GL_COLOR_ATTACHMENT0 (GL_RENDERBUFFER in turn has its storage set to the layer itself);
p = program identifier P - 1 (last program in the chain)
glUseProgram(p)
glDrawArrays()
present GL_RENDERBUFFER onscreen
This approach seems to work so far, but there's a number of things I'm wondering about:
Best way to implement adding / removing of filters:
Adding and removing programs seems the most logical approach right now. However this means one program per plugin and switching between all of these at render time. I wonder how these other approaches would compare:
Attaching / detaching shader-pairs and re-linking a single composite program, instead of adding / removing programs. The OpenGL ES 2.0 Programming Guide says you cannot do it. However, since desktop GL allows for multiple shader objects in one program, I'm anyway curious if it would be a better approach if ES supported it.
Keeping the filters in text format (their code within a function other than main) and instead compile them all into a monolithic shader pair (with an added main of course) each time a filter is added / removed.
Best way to implement per-filter caching:
Right now, adding / removing any number of filters at any point in the chain requires running all programs again to render the final image. It'd be nice however if I could somehow cache the output of each filter. That way, removing, adding or bypassing a filter would only require running the filters past the point of insertion / deletion / bypassing in the chain. I can think of a naive approach: on each program pass, bind a different texture object to GL_TEXTURE0 and to the GL_COLOR_ATTACHMENT0of the frame buffer. In this way I can keep the output of every filter around. However, creating a new texture, binding and changing the framebuffer attachment once per filter seems inefficient.
I don't have much to say about the filter output caching problem, but as for filter switching... The EXT_separate_shader_objects extension is designed to solve this very problem, and it's supported on every device that runs iOS 5.0 or later. Here's a brief overview:
There's a new convenience API for compiling shader programs that also takes care of making them "separable":
_vertexProgram = glCreateShaderProgramvEXT(GL_VERTEX_SHADER, 1, &source);
Program Pipeline Objects manage your program state and let you mix and match already-compiled shaders:
GLuint _ppo;
glGenProgramPipelinesEXT(1, &_ppo);
glBindProgramPipelineEXT(_ppo);
glUseProgramStagesEXT(_ppo, GL_VERTEX_SHADER_BIT_EXT, _vertexProgram);
glUseProgramStagesEXT(_ppo, GL_FRAGMENT_SHADER_BIT_EXT, _fragmentProgram);
Mixing and matching shaders can make attribute binding a pain, so you can specify that in the shader (likewise for varyings):
#extension GL_EXT_separate_shader_objects : enable
layout(location = 0) attribute vec4 position;
layout(location = 1) attribute vec3 normal;
Uniforms are set for the shader program they belong to:
glProgramUniformMatrix3fvEXT(_vertexProgram, u_normalMatrix, 1, 0, _normalMatrix.m);

How to design a simple GLSL wrapper for shader use

UPDATE: Because I needed something right away, I've created a simple shader wrapper that does the sort of thing I need. You can find it here: ShaderManager on GitHub. Note that it's designed for Objective-C / iOS, so may not be useful to everyone. If you have any suggestions for design improvements, please let me know!
Original Problem:
I'm new to using GLSL shaders. I'm familiar enough with the GLSL language and the OpenGL interface, but I'm having trouble designing a simple API through which to use shaders.
OpenGL's C interface to interact with shaders seems cumbersome. I can't seem to find any tutorials on the net that cover the API design of such things.
My question is this: does any one have a good, simple, API design or pattern to wrap the OpenGL shader program API?
Take the following simple example. Say I have one vertex shader that just emulates fixed functionality, and two fragment shaders - one for drawing smooth rectangles and one for drawing smooth circles. I have the following files:
Shader.vsh : Simple vertex shader, with the following inputs/outputs:
-- Uniforms: mat4 Model, mat4 View, mat4 Projection
-- Attributes: vec4 Vertex, vec2 TexCoord, vec4 Color
-- Varying: vec4 vColor, vec2 vTexCoord
Square.fsh : Fragment shader for drawing squares based on tex coord / color
Circle.fsh : Fragment shader for drawing circles based on tex coord / color
Basic Linking
Now what is the standard way to use these? Do I link the above shaders into two OpenGL shader programs? That is:
Shader.vsh + Square.fsh = SquareProgram
Shader.vsh + Circle.fsh = CircleProgram
Or do I instead create one big program where the fragment shaders check some conditional uniform variables and call out to a shader function to generate their result. E.g:
Shader.vsh + Square.fsh + Circle.fsh + Main.fsh = ShaderProgram
//Main.fsh here would simply check whether to call out to square or circle
With two individual programs I would presumably need to call
glUseProgram(CircleProgram); or glUseProgram(SquareProgram);
Before each type of element I want to draw. I would then need to set the uniforms (Model / View / Projection) and attributes of each program before I use it. This seems so unwieldy.
With the single ShaderProgram option I would still need to set some sort of boolean switch (circle or square) in the fragment shader that would be checked before drawing each pixel. This also seems complicated.
As a side note, am I allowed to link two fragment shaders, each with a main() function, into one shader program? How would OpenGL know which one to call?
Setting Variables
The calls:
glUniform*
glVertexAttribPointer
Are used to set uniforms and attribute pointer locations on the current program.
Different classes and structures may need to access and set variables on the current shader (or change the current shader) from different places in the code. I can't think of a nice way to do this that decouples the shader code from the code that wants to use it.
That is, each shape I want to draw will need to set vertex and texture coordinate attributes - requiring the handles to those attributes generated by OpenGL.
The camera will need to set its projection matrix as a uniform in the vertex shader, while the class managing the model matrix stack will need to set its own uniform in the vertex shader.
Changing shaders part-way through drawing a scene would mean that all these classes will need to set their uniforms and attributes again.
How do most people design around this?
A global dictionary of shaders accessed by handle or name, with getters and setters for their parameters?
An OO design with shader objects that each have parameters?
I've looked at the following wrappers:
Jon's Teapot: GLSL Shader Manager - This wraps shaders in C++ classes. It seems like little more than a wrapper that enforces OO principles on a C API, resulting in a C++ API that is much the same.
I am after any sort of design that simplifies the use of Shader programs, and am not concerned about the particular paradigm used (OO, procedural, and so on)
I see this is tagged with iOS, so if you're partial to Objective-C, I'd take a good look at Jeff LaMarche's GLProgram wrapper class, which he describes here and has source available here. I've used it within my own applications to simplify some of the shader program setup, and to make the code a little cleaner.
For example, you can set up a shader and its attributes and uniforms using code like the following:
sphereDepthProgram = [[GLProgram alloc] initWithVertexShaderFilename:#"SphereDepth" fragmentShaderFilename:#"SphereDepth"];
[sphereDepthProgram addAttribute:#"position"];
[sphereDepthProgram addAttribute:#"inputImpostorSpaceCoordinate"];
if (![sphereDepthProgram link])
{
NSLog(#"Depth shader link failed");
NSString *progLog = [sphereDepthProgram programLog];
NSLog(#"Program Log: %#", progLog);
NSString *fragLog = [sphereDepthProgram fragmentShaderLog];
NSLog(#"Frag Log: %#", fragLog);
NSString *vertLog = [sphereDepthProgram vertexShaderLog];
NSLog(#"Vert Log: %#", vertLog);
[sphereDepthProgram release];
sphereDepthProgram = nil;
}
sphereDepthPositionAttribute = [sphereDepthProgram attributeIndex:#"position"];
sphereDepthImpostorSpaceAttribute = [sphereDepthProgram attributeIndex:#"inputImpostorSpaceCoordinate"];
sphereDepthModelViewMatrix = [sphereDepthProgram uniformIndex:#"modelViewProjMatrix"];
sphereDepthRadius = [sphereDepthProgram uniformIndex:#"sphereRadius"];
When you need to use the shader program, you then do something like the following:
[sphereDepthProgram use];
This doesn't address the issues of branching vs. individual shaders that you bring up above, but Jeff's implementation does provide a nice encapsulation of some of the OpenGL ES boilerplate shader setup code.
Basic Linking:
There is no standard way here. There are at least 2 general approaches:
Monolithic - one shader covers many cases, using uniform boolean switches. These branches don't hurt performance because the condition result is constant for any fragment group (actually, for all of the fragments).
Multi-object program compositing - main shader declares a set of entry points (like 'get_diffuse', 'get_specular', etc), which are implemented in separate shader objects attached. This implies individual shader for each object, but any kind of caching helps.
Setting Variables: Uniforms
I will just describe the approach I developed.
Each shader program has a list of uniform dictionaries. It's used to fill the uniform source list upon program (re-)linking. When the program is activated, it goes through the uniform list, fetches values from their sources and uploads them to GL. In the result, data is not directly connected with the user shader program, and whatever manages it does not care about the program using it.
One of these dictionaries can be, for example, a core one, containing model,view transformations, camera projection and maybe something else.
Setting Variables: Attributes
First of all, shader program is an attribute consumer, so it is what has to extract these attributes from a mesh (or any other data storage) and upload them to GL in a way it needs. It should also make sure that types of provided attributes match the requested types.
When using with monolithic shader approach, there is a possible unpleasant situation when one the disabled branch ways requires a vertex attribute that is not provided. I would advice using another attribute's data to supply the missing one, because we don't care about the actual values in this case.
P.S.
You can find an actual implementation of these ideas here: http://code.google.com/p/kri/

Resources