I completed a tutorial on rendering 2d triangles in directx. Now, I want to use my knowledge of rendering a single triangle to render multiple triangles, or for that matter multiple objects on screen.
Should I create a list/stack/vector of vertexbuffers and input layouts and then draw each object? Or is there a better approach to this?
My process would be:
Setup directx, including vertex and pixel shaders
Create vertex buffers for each shape that has to be drawn on the screen and store them in an array.
Draw them to the render target for each frame(each frame)
Present the render target(each frame)
Please assume very rudimentary knowledge of DirectX and graphics programming in general when answering.
You don't need to create vertex buffer for each shape, you can just create one to store all the vertices of all triangles, then create a index buffer to store all indices of all shapes, at last draw them with index buffer.
I am not familiar with DX11, So, I just list the links for D3D 9 for your reference, I think the concept was same, just with some API changes.
Index Buffers(Direct3D 9)
Rendering from Vertex and Index buffers
If the triangles are in the same shape, just with different position or colors, you can consider using geometry instancing, it's a powerful way to render multiple copies of the same geometry.
Geometry Instancing
Efficiently Drawing Multiple Instances of Geometry(D3D9)
I don't know much about DirectX but general rule in rendering on GPU is to use separate vertex and index buffers for every mesh.
Although there is nothing limiting you from using single vertex buffer with many index buffers, in fact you may get some performance gains especially for small meshes...
You'll need just one vertex buffer for do this , and then Batching them,
so here is what you can do, you can make an array/vector holding the triangle information, let's say (pseudo-code)
struct TriangleInfo{
..... texture;
vect2 pos;
vect2 dimension;
float rot;
}
then in you draw method
for(int i=0; i < vector.size(); i++){
TriangleInfo tInfo = vector[i];
matrix worldMatrix = Transpose(matrix(tInfo.dimension) * matrix(tInfo.rot) * matrix(tInfo.pos));
shaderParameters.worldMatrix = worldMatrix; //info to the constabuffer
..
..
dctx->PSSetShaderResources(0, 1, &tInfo.texture);
dctx->Draw(0,4);
}
then in your vertex shader:
cbuffer cbParameters : register( b0 ) {
float4x4 worldMatrix;
};
VOut main(float4 position : POSITION, float4 texCoord : TEXCOORD)
{
....
output.position = mul(position,worldMatrix);
...
}
Remenber all is pseudo-code, but this should give you the idea, but there is a problem if you are planing to make a lot of Triangle, let's say 1000 triangles, maybe this is not the best option, you should using DrawIndexed and modifying the vertex position of each triangle, or you can use DrawInstanced , that is simpler , to be able to send all the information in just once Draw call, because calling Draw * triangleCount , is very heavy for large amounts
Related
The Metal Shading Language includes a lot of mathematic functions, but it seems most of the codes inside Metal official documentation just use it to map vertexes from pixel space to clip space like
RasterizerData out;
out.clipSpacePosition = vector_float4(0.0, 0.0, 0.0, 1.0);
float2 pixelSpacePosition = vertices[vertexID].position.xy;
vector_float2 viewportSize = vector_float2(*viewportSizePointer);
out.clipSpacePosition.xy = pixelSpacePosition / (viewportSize / 2.0);
out.color = vertices[vertexID].color;
return out;
Except for GPGPU using kernel functions to do parallel computation, what things that vertex function can do, with some examples? In a game, if all vertices positions are calculated by the CPU, why GPU still matters? What does vertex function do usually?
Vertex shaders compute properties for vertices. That's their point. In addition to vertex positions, they also calculate lighting normals at each vertex. And potentially texture coordinates. And various material properties used by lighting and shading routines. Then, in the fragment processing stage, those values are interpolated and sent to the fragment shader for each fragment.
In general, you don't modify vertices on the CPU. In a game, you'd usually load them from a file into main memory, put them into a buffer and send them to the GPU. Once they're on the GPU you pass them to the vertex shader on each frame along with model, view, and projection matrices. A single buffer containing the vertices of, say, a tree or a car's wheel might be used multiple times. Each time all the CPU sends is the model, view, and projection matrices. The model matrix is used in the vertex shader to reposition and scale the vertice's positions in world space. The view matrix then moves and rotates the world around so that the virtual camera is at the origin and facing the appropriate way. Then the projection matrix modifies the vertices to put them into clip space.
There are other things a vertex shader can do, too. You can pass in vertices that are in a grid in the x-y plane, for example. Then in your vertex shader you can sample a texture and use that to generate the z-value. This gives you a way to change the geometry using a height map.
On older hardware (and some lower-end mobile hardware) it was expensive to do calculations on a texture coordinate before using it to sample from a texture because you lose some cache coherency. For example, if you wanted to sample several pixels in a column, you might loop over them adding an offset to the current texture coordinate and then sampling with the result. One trick was to do the calculation on the texture coordinates in the vertex shader and have them automatically interpolated before being sent to the fragment shader, then doing a normal look-up in the fragment shader. (I don't think this is an optimization on modern hardware, but it was a big win on some older models.)
First, I'll address this statement
In a game, if all vertices positions are calculated by the CPU, why GPU still matters? What does vertex function do usually?
I don't believe I've seen anyone calculating positions for meshes that will be later used to render them on a GPU. It's slow, you would need to get all this data from CPU to a GPU (which means copying it through a bus if you have a dedicated GPU). And it's just not that flexible. There are much more things other than vertex positions that are required to produce any meaningful image and calculating all this stuff on CPU is just wasteful, since CPU doesn't care for this data for the most part.
The sole purpose of vertex shader is to provide rasterizer with primitives that are in clip space. But there are some other uses that are mostly tricks based on different GPU features.
For example, vertex shaders can write out some data to buffers, so, for example, you can stream out transformed geometry if you don't want to transform it again at a later vertex stage if you have multi-pass rendering that uses the same geometry in more than one pass.
You can also use vertex shaders to output just one triangle that covers the whole screen, so that fragment shaders gets called one time per pixel for the whole screen (but, honestly, you are better of using compute (kernel) shaders for this).
You can also write out data from vertex shader and not generate any primitives. You can do that by generating degenerate triangles. You can use this to generate bounding boxes. Using atomic operations you can update min/max positions and read them at a later stage. This is useful for light culling, frustum culling, tile-based processing and many other things.
But, and it's a BIG BUT, you can do most of this stuff in a compute shader without incurring GPU to run all the vertex assembly pipeline. That means, you can do full-screen effects using just a compute shader (instead of vertex and fragment shader and many pipeline stages in between, such as rasterizer, primitive culling, depth testing and output merging). You can calculate bounding boxes and do light culling or frustum culling in compute shader.
There are reasons to fire up the whole rendering pipeline instead of just running a compute shader, for example, if you will still use triangles that are output from vertex shader, or if you aren't sure how primitives are laid out in memory so you need vertex assembler to do the heavy lifting of assembling primitives. But, getting back to your point, almost all of the reasonable uses for vertex shader include outputting primitives in clip space. If you aren't using resulting primitives, it's probably best to stick to compute shaders.
I've been studying shaders in HLSL for an XNA project (so no DX10-DX11) but almost all resouces I found were tutorial of effects where the most part of the work was done in the pixel shader. For istance in lights the vertex shader is used only to serve to the pixel one normals and other things like that.
I'd like to make some effect based on the vertex shader rather than the pixel one, like deformation for istance. Could someone suggest me a book or a website? Even the bare effect name would be useful since than I could google it.
A lot of lighting, etc. is done in the pixel shader because the resulting image quality will be much better.
Imagine a sphere that is created by subdividing a cube or icosahedron. If lighting calculations are done in the vertex shader, the resulting values will be interpolated between face edges, which can lead to a flat or faceted appearance.
Things like blending and morphing are done in the vertex shader because that's where you can manipulate the vertices.
For example:
matrix World;
matrix View;
matrix Projection;
float WindStrength;
float3 WindDirection;
VertexPositionColor VS(VertexPositionColor input)
{
VertexPositionColor output;
matrix wvp = mul(mul(World,View),Projection);
float3 worldPosition = mul(World,input.Position);
worldPosition += WindDirection * WindStrength * worldPosition.y;
output.Position = mul(mul(View,Projection),worldPositioninput);
output.Color = input.Color;
return output;
}
(Pseudo-ish code since I'm writing this in the SO post editor.)
In this case, I'm offsetting vertices that are "high" on the Y axis with a wind direction and strength. If I use this when rendering grass, for instance, the tops of the blades will lean in the direction of the wind, while the vertices that are closer to the ground (ideally with a Y of zero) will not move at all. The math here should be tweaked a bit to take into account really tall things that would cause unacceptable large changes, and the wind should not be uniformly applied to all blades, but it should be clear that here the vertex shader is modifying the mesh in a non-uniform way to get an interesting effect.
No matter the effect you are trying to achieve - morphing, billboards (so the item you're drawing always faces the camera), etc., you're going to wind up passing some parameters into the VS that are then selectively applied to vertices as they pass through the pipeline.
A fairly trivial example would be "inflating" a model into a sphere, based on some parameter.
Pseudocode again,
matrix World;
matrix View;
matrix Projection;
float LerpFactor;
VertexShader(VertexPositionColor input)
float3 normal = normalize(input.Position);
float3 position = lerp(input.Position,normal,LerpFactor);
matrix wvp = mul(mul(World,View),Projection);
float3 outputVector = mul(wvp,position);
....
By stepping the uniform LerpFactor from 0 to 1 across a number of frames, your mesh (ideally a convex polyhedron) will gradually morph from its original shape to a sphere. Of course, you could include more explicit morph targets in your vertex declaration and morph between two model shapes, collapse it to a less complex version of a model, open the lid on a box (or completely unfold it), etc. The possibilites are endless.
For more information, this page has some sample code on generating and using morph targets on the GPU.
If you need some good search terms, look for "xna bones," "blendweight" and "morph targets."
I have a C++ DirectX 11 renderer that I have been writing.
I have written a COLLADA 1.4.1 loader to import COLLADA data for use in supporting skeletal animations.
I'm validating the loader at this point (and I've supported COLLADA before in another renderer I've written previously using different technology) and I'm running into a problem matching up COLLADA with DX10/11.
I have 3 separate vertex buffers of data:
A vertex buffer of Unique vertex positions.
A vertex buffer of Unique normals.
A vertex buffer of Unique texture coordinates.
These vertex buffers contain different array length (positions has 2910 elements, normals has more than 9000, and texture coordinates has roughly 3200.)
COLLADA provides a triangle list which gives me the indices into each of these arrays for a given triangle (verbose and oddly done at first, but ultimately it becomes simple once you've worked with it.)
Knowing that DX10/11 support multiple vertex buffer I figured I would be filling the DX10/11 index buffer with indices into each of these buffers * and * (this is the important part), these indices could be different for a given point of a triangle.
In other words, I could set the three vertex buffers, set the correct input layout, and then in the index buffer I would put the equivalent of:
l_aIndexBuffer[ NumberOfTriangles * 3 ]
for( i = 0; i < NumberOfTriangles; i++ )
{
l_aIndexBufferData.add( triangle[i].Point1.PositionIndex )
l_aIndexBufferData.add( triangle[i].Point1.NormalIndex )
l_aIndexBufferData.add( triangle[i].Point1.TextureCoordinateIndex )
}
The documentation regarding using multiple vertex buffers in DirectX doesn't seem to give any information about how this affects the index buffer (more on this later.)
Running the code that way yield strange rendering results where I could see the mesh I had being drawn intermittently correctly (strange polygons but about a third of the points were in the correct place - hint - hint)
I figured I'd screwed up my data or my indices at this point (yesterday) so I painstakingly validated it all, and so I figured I was screwing upon my input or something else. I eliminated this by using the values from the normal and texture buffers to alternatively set the color value used by the pixel shader, the colors were correct so I wasn't suffering a padding issue.
Ultimately I came to the conclusion that DX10/11 must be expect the data ordered in a different fashion, so I tried storing the indices in this fashion:
indices.add( Point1Position index )
indices.add( Point2Position index )
indices.add( Point3Position index )
indices.add( Point1Normal index )
indices.add( Point2Normal index )
indices.add( Point3Normal index )
indices.add( Point1TexCoord index )
indices.add( Point2TexCoord index )
indices.add( Point3TexCoord index )
Oddly enough, this yielded a rendered mesh that looked 1/3 correct - hint - hint.
I then surmised that maybe DX10/DX11 wanted the indices stored 'by vertex buffer' meaning that I would add all the position indices for all the triangles first, then all the normal indices for all the triangles, then all the texture coordinate indices for all the triangles.
This yielded another 1/3 correct (looking) mesh.
This made me think - well, surely DX10/11 wouldn't provide you with the ability to stream from multiple vertex buffers and then actually expect only one index per triangle point?
Only including indices into the vertex buffer of positions yields a properly rendered mesh that unfortunately uses the wrong normals and texture coordinates.
It appears that putting the normal and texture coordinate indices into the index buffer caused erroneous drawing over the properly rendered mesh.
Is this the expected behavior?
Multiple Vertex Buffers - One Index Buffer and the index buffer can only have a single index for a point of a triangle?
That really just doesn't make sense to me.
Help!
The very first thing that comes in my head:
All hardware that supports compute shaders (equal to almost all DirectX 10 and higher) also supports ByteAddressBuffers and most of it supports StructuredBuffers. So you can bind your arrays as SRVs and have random access to any of its elements in shaders.
Something like this (not tested, just pseudocode):
// Indices passed as vertex buffer to shader
// Think of them as of "references" to real data
struct VS_INPUT
{
uint posidx;
uint noridx;
uint texidx;
}
// The real vertex data
// You pass it as structured buffers (similar to textures)
StructuredBuffer<float3> pos : register (t0);
StructuredBuffer<float3> nor : register (t1);
StructuredBuffer<float2> tex : register (t2);
VS_OUTPUT main(VS_INPUT indices)
{
// in shader you read data for current vertex
float3 pos = pos[indices.posidx];
float3 nor = nor[indices.noridx];
float2 tex = tex[indices.texidx];
// here you do something
}
Let's call that "compute shader approach". You must use DirectX 11 API.
Also you can bind your indices in same fashion and do some magic in shaders. In this case you need to find out current index id. Probably you can take it from SV_VertexId.
And probably you can workaround these buffers and bind data somehow else ( DirectX 9 compatible texture sampling! O_o ).
Hope it helps!
I have two sets of vertexes used as a line strip:
Vertexes1
Vertexes2
It's important to know that these vertexes have previously unknown values, as they are dynamic.
I want to make an animated transition (morph) between these two. I have come up with two different ways of doing this:
Option 1:
Set a Time uniform in the vertex shader, that goes from 0 - 1, where I can do something like this:
// Inside main() in the vertex shader
float originX = Position.x;
float destinationX = DestinationVertexPosition.x;
float interpolatedX = originX + (destinationX - originX) * Time;
gl_Position.x = interpolatedX;
As you probably see, this has one problem: How do I get the "DestinationVertexPosition" in there?
Option 2:
Make the interpolation calculation outside the vertex shader, where I loop through each vertex and create a third vertex set for the interpolated values, and use that to render:
// Pre render
// Use this vertex set to render
InterpolatedVertexes
for (unsigned int i = 0; i < vertexCount; i++) {
float originX = Vertexes1[i].x;
float destinationX = Vertexes2[i].x;
float interpolatedX = originX + (destinationX - originX) * Time;
InterpolatedVertexes[i].x = interpolatedX;
}
I have highly simplified these two code snippets, just to make the idea clear.
Now, from the two options, I feel like the first one is definitely better in terms of performance, given stuff happens at the shader level, AND I don't have to create a new set of vertexes each time the "Time" is updated.
So, now that the introduction to the problem has been covered, I would appreciate any of the following three things:
A discussion of better ways of achieving the desired results in OpenGL ES 2 (iOS).
A discussion about how Option 1 could be implemented properly, either by providing the "DestinationVertexPosition" or by modifying the idea somehow, to better achieve the same result.
A discussion about how Option 2 could be implemented.
In ES 2 you specify such attributes as you like — there's therefore no problem with specifying attributes for both origin and destination, and doing the linear interpolation between them in the vertex shader. However you really shouldn't do it component by component as your code suggests you want to as GPUs are vector processors, and the mix GLSL function will do the linear blend you want. So e.g. (with obvious inefficiencies and assumptions)
int sourceAttribute = glGetAttribLocation(shader, "sourceVertex");
glVertexAttribPointer(sourceAttribute, 3, GL_FLOAT, GL_FALSE, 0, sourceLocations);
int destAttribute = glGetAttribLocation(shader, "destVertex");
glVertexAttribPointer(destAttribute, 3, GL_FLOAT, GL_FALSE, 0, destLocations);
And:
gl_Position = vec4(mix(sourceVertex, destVertex, Time), 1.0);
Your two options here have a trade off: supply twice the geometry once and interpolate between that, or supply only one set of geometry, but do so for each frame. You have to weigh geometry size vs. upload bandwidth.
Given my experience with iOS devices, I'd highly recommend option 1. Uploading new geometry on every frame can be extremely expensive on these devices.
If the vertices are constant, you can upload them once into one or two vertex buffer objects (VBOs) with the GL_STATIC_DRAW flag set. The PowerVR SGX series has hardware optimizations for dealing with static VBOs, so they are very fast to work with after the initial upload.
As far as how to upload two sets of vertices for use in a single shader, geometry is just another input attribute for your shader. You could have one, two, or more sets of vertices fed into a single vertex shader. You just define the attributes using code like
attribute vec3 startingPosition;
attribute vec3 endingPosition;
and interpolate between them using code like
vec3 finalPosition = startingPosition * (1.0 - fractionalProgress) + endingPosition * fractionalProgress;
Edit: Tommy points out the mix() operation, which I'd forgotten about and is a better way to do the above vertex interpolation.
In order to inform your shader program as to where to get the second set of vertices, you'd use pretty much the same glVertexAttribPointer() call for the second set of geometry as the first, only pointing to that VBO and attribute.
Note that you can perform this calculation as a vector, rather than breaking out all three components individually. This doesn't get you much with a highp default precision on current PowerVR SGX chips, but could be faster on future ones than doing this one component at a time.
You might also want to look into other techniques used for vertex skinning, because there might be other ways of animating vertices that don't require two full sets of vertices to be uploaded.
The one case that I've heard where option 2 (uploading new geometry on each frame) might be preferable is in specific cases where using the Accelerate framework to do vector manipulation of the geometry ends up being faster than doing the skinning on-GPU. I remember the Unity folks were talking about this once, but I can't remember if it was for really small or really large sets of geometry. Option 1 has been faster in all the cases I've worked with myself.
I'm trying to implement the technique described at : Compositing Images with Depth.
The idea is to use an existing texture (loaded from an image) as a depth mask, to basically fake 3D.
The problem I face is that glDrawPixels is not available in OpenglES. Is there a way to accomplish the same thing on the iPhone?
The depth buffer is more obscured than you think in OpenGL ES; not only is glDrawPixels absent but gl_FragDepth has been removed from GLSL. So you can't write a custom fragment shader to spool values to the depth buffer as you might push colours.
The most obvious solution is to pack your depth information into a texture and to use a custom fragment shader that does a depth comparison between the fragment it generates and one looked up from a texture you supply. Only if the generated fragment is closer is it allowed to proceed. The normal depth buffer will catch other cases of occlusion and — in principle — you could use a framebuffer object to create the depth texture in the first place, giving you a complete on-GPU round trip, though it isn't directly relevant to your problem.
Disadvantages are that drawing will cost you an extra texture unit and textures use integer components.
EDIT: for the purposes of keeping the example simple, suppose you were packing all of your depth information into the red channel of a texture. That'd give you a really low precision depth buffer, but just to keep things clear, you could write a quick fragment shader like:
void main()
{
// write a value to the depth map
gl_FragColor = vec4(gl_FragCoord.w, 0.0, 0.0, 1.0);
}
To store depth in the red channel. So you've partially recreated the old depth texture extension — you'll have an image that has a brighter red in pixels that are closer, a darker red in pixels that are further away. I think that in your question, you'd actually load this image from disk.
To then use the texture in a future fragment shader, you'd do something like:
uniform sampler2D depthMap;
void main()
{
// read a value from the depth map
lowp vec3 colourFromDepthMap = texture2D(depthMap, gl_FragCoord.xy);
// discard the current fragment if it is less close than the stored value
if(colourFromDepthMap.r > gl_FragCoord.w) discard;
... set gl_FragColor appropriately otherwise ...
}
EDIT2: you can see a much smarter mapping from depth to an RGBA value here. To tie in directly to that document, OES_depth_texture definitely isn't supported on the iPad or on the third generation iPhone. I've not run a complete test elsewhere.