Multiple constant buffer update before a single Present - directx

I'm writing a particle system manager with SlimDX/D3D11, and I'm finding some problems with the constant buffers.
My particle system manager can draw multiple particle systems with different physical characteristics at the same time, and is based on the Particle 3D sample of XNA. The general idea of the draw operation is:
context.ClearRenderTargetView(...)
render particles
swapChain.Present()
The "render particles" operation is implemented like that:
shaderTimeGlobalVariable.Set(currentTime)
for each pass of the shader technique:
apply pass
for each particle system:
set constant buffer with physical constants (EffectConstantBuffer.SetRawValue)
issue one or more context.Draw calls
My problem is that it seems only the last constant buffer call is considered, thus forcing all the particle systems to have the same physical characteristics.
So, first of all: I suspect this is correct/expected, and I'm missing some crucial point. Can anyone clarify me what am I missing?
I expect a solution would be to insert the physical characteristics of the particles into the vertex buffer, with a cost of about 12 bytes more per vertex, thus between 12k and 120k more for each physical system I'm working with. Something I would like to avoid, if possible: is there any other option available?

If you change a shader variable, you have to (re-)apply the pass. So your steps will be:
Set time
for each particle system
set constant buffer
for each pass
Apply pass
Draw
Depending on the actual buffer data, you can reorder the calls for performance optimization.
The reason, why you have to apply the pass after setting the constant buffers, is that the effect framework sends data to the GPU on apply-calls only.

Related

in-place processing of a Metal texture

Is it possible to process an MTLTexture in-place without osx_ReadWriteTextureTier2?
It seems like I can set two texture arguments to be the same texture. Is this supported behavior?
Specifically, I don't mind not having texture caching update after a write. I just want to in-place (and sparsely) modify a 3d texture. It's memory prohibitive to have two textures. And it's computationally expensive to copy the entire texture, especially when I might only be updating a small portion of it.
Per the documentation, regardless of feature availability, it is invalid to declare two separate texture arguments (one read, one write) in a function signature and then set the same texture for both.
Any Mac that supports osx_GPUFamily1_v2 supports function texture read-writes (by declaring the texture with access::read_write).
The distinction between "Tier 1" (which has no explicit constant) and osx_ReadWriteTextureTier2 is that the latter supports additional pixel formats for read-write textures.
If you determine that your target Macs don't support the kind of texture read-writes you need (because you need to deploy to OS X 10.11 or because you're using an incompatible pixel format for the tier of machine you're deploying to), you could operate on your texture one plane at a time, reading from your 3D texture, writing to a 2D texture, and then blitting the result back into the corresponding region in your 3D texture. It's more work, but it'll use much less than double the memory.

ios games - Is there any drawbacks on GPU side calculations?

Topic is pretty much the question. I'm trying to understand how CPU and GPU cooperation works.
I'm developing my game via cocos2d. It is a game engine so it redraws the whole screen 60 times per second. Every node in cocos2d draws its own set of triangles. Usually you set vertexes for triangle after performing node transforms (from node to world) on CPU side. I've realized the way to do it on GPU side with vertex shaders by passing view model projection to uniforms.
I see CPU time decreasing by ~1ms and gpu time raised by ~0.5ms.
Can I consider this as a performance gain?
In other words: if something can be done on GPU side is there any reasons you shouldn't do it?
The only time you shouldn't do something on the GPU side is if you need the result (in easily accessible form) on the CPU side to further the simulation.
Taking your example. If we assume you have 4 250KB meshes which represent a hierarchy of body parts (as a skeleton). Lets assume you are using a 4x4 matrix of floats for the transformations (64bytes) for each mesh. You could either:
Each frame, perform the mesh transformation calculations on the application side (CPU) and then upload the four meshes to the GPU. This would result in about ~1000kb of data being sent to the GPU per frame.
When the application starts, upload the data for the 4 meshes to the GPU (this will be in a rest / identity pose). Then each frame when you make the render call, you calculate only the new matrices for each mesh (position/rotation/scale) and upload those matrices to the GPU and perform the transformation there. This results in ~256bytes being sent to the GPU per frame.
As you can see, even if the data in the example is fabricated, the main advantage is that you are minimizing the amount of data being transferred between CPU and GPU on a per frame basis.
The only time you would prefer the first option is if your application needs the results of the transformation to do some other work. The GPU is very efficient (especially at processing vertices in parallel), but it isn't too easy to get information back from the GPU (and then its usually in the form on a texture - i.e. a RenderTarget). One concrete example of this 'further work' might be performing collision checks on transformed mesh positions.
edit
You can tell based on how you are calling the openGL api where the data is stored to some extent*. Here is a quick run-down:
Vertex Arrays
glVertexPointer(...)
glDrawArray(...)
using this method passing an array of vertices from the CPU -> GPU each frame. The vertices are processed sequentially as they appear in the array. There is a variation of this method (glDrawElements) which lets you specify indices.
VBOs
glBindBuffer(...)
glBufferData(...)
glDrawElements(...)
VBOs allow you to store the mesh data on the GPU (see below for note). In this way, you don't need to send the mesh data to the GPU each frame, only the transformation data.
*Although we can indicate where our data is to be stored, it is not actually specified in the OpenGL specification how the vendors are to implement this. It means that, we can give hints that our vertex data should be stored in VRAM, but ultimately, it is down to the driver!
Good reference links for this stuff is:
OpenGL ref page: https://www.opengl.org/sdk/docs/man/html/start.html
OpenGL explanations: http://www.songho.ca/opengl
Java OpenGL concepts for rendering: http://www.java-gaming.org/topics/introduction-to-vertex-arrays-and-vertex-buffer-objects-opengl/24272/view.html

Compute Shader, Buffer or texture

I'm trying to implement fluid dynamics using compute shaders. In the article there are a series of passes done on a texture since this was written before compute shaders.
Would it be faster to do each pass on a texture or buffer? The final pass would have to be applied to a texture anyways.
I would recommend using whichever dimensionality of resource fits the simulation. If it's a 1D simulation, use a RWBuffer, if it's a 2D simulation use a RWTexture2D and if it's a 3D simulation use a RWTexture3D.
There appear to be stages in the algorithm that you linked that make use of bilinear filtering. If you restrict yourself to using a Buffer you'll have to issue 4 or 8 memory fetches (depending on 2D or 3D) and then more instructions to calculate the weighted average. Take advantage of the hardware's ability to do this for you where possible.
Another thing to be aware of is that data in textures is not laid out row by row (linearly) as you might expect, instead it's laid in such a way that neighbouring texels are as close to one another in memory as possible; this can be called Tiling or Swizzling depending on whose documentation you read. For that reason, unless your simulation is one-dimensional, you may well get far better cache coherency on reads/writes from a resource whose layout most closely matches the dimensions of the simulation.

Better to change uniforms or change program?

Using webgl I need to perform 3 passes to render my scene. Each pass runs the same geometry and shaders but has differing values for some uniforms and textures.
I seem to have two choices. Have a single "program" and set all of the uniforms and textures for each pass. Or have 3 "programs" each containing the same shaders, and set all the necessary uniforms/shaders once per program, and then just switch programs for each pass. This means that I will do one useProgram call per pass instead of man setUniform calls for each pass.
Is this second technique likely to be faster as it will avoid very many setuniform calls, or is changing the program very expensive? I've done some trials but with the very simple geometry I have at the moment I don't see any difference in performance because setup costs overwhelm any differences.
Is there any reason to prefer one technique over the other?
Just send different values via glUniform if the shader programs are the same.
Switching between programs is generally slower than change value of uniform.
Anyway Uber Shader Program (with list of uniforms like useLighting, useAlphaMap) in most cases aren't good.
#gman
We are talking about WebGL (GLES 2.0) where we don't have UBO. (uniform buffer object)
#top
Summing try to avoid rebinding shader programs (but it's not end of the world) and don't create one uber shader!
When you have large amouts of textures to rebind, texture atlasing should be the fastest solution, so you don't need to rebind textures, don't need to rebind programs. Textures can be switched by modifying uniforms representing texCoord offsets.
Modifying such uniforms can be optimized even further:
You should consider moving frequently modified uniforms to attributes. Usualy their data source are provided using attribPointers but you can also use constant values when they are disabled. Instead of unformXXX() use attribXXX() functions to specify their constant values.
I think best example is light position. Normaly you'd have to specify uniform values for it every time light position changes to ALL programs that make use of it. In contrast, when using 'attributed' uniforms you can specify attribute value once globaly when your light moves.
-pros:
This method is best suited when you have many programs which would like to share uniforms, as we know we can't use uniform buffers in WebGL, it seams to be the only reasonable solution.
-cons:
Of course available size of such 'attributed' uniforms will be much smaller than using regular uniforms, but it still can speed things up a lot if you do it to some part of your uniforms.

Speed of ComputeShader vs. PixelShader

I've got a question regarding ComputeShader compared to PixelShader.
I want to do some processing on a buffer, and this is possible both with a pixel shader and a compute shader, and now I wonder if there is any advantage in either over the other one, specifically when it comes to speed. I've had issues with either getting to use just 8 bit values, but I should be able to work-around that.
Every data point in the output will be calculated from using in total 8 data points surrounding it (MxN matrix), so I'd think this would be perfect for a pixel shader, since the different outputs don't influence each other at all.
But I was unable to find any benchmarkings to compare the shaders, and now I wonder which one I should aim for. Only target is the speed.
From what i understand, shaders are shaders in the sense that they are just programs run by alot of threads on data. Therefore, in general there should not be any diffrence in terms of computing power/speed doing calculations in the pixel shader as opposed to the compute shader. However..
To do calculations on the pixelshader you have to massage your data so that it looks like image data, this means you have to draw a quad first of all, but also that your output must have the 'shape' of a pixel (float4 basically). This data must then be interpreted by you app into something useful
if you're using the computeshader you can completly control the number of threads to use where as for pixel shaders they have to be valid resolutions. Also you can input and output data in any format you like and take advantage of accelerated conversion using UAVs (i think)
i'd recommend using computeshaders since they are ment for doing general purpose computation and are alot easier to work with. Your over all application will probably be faster too, even if the actual shader computation time is about the same, just because you can avoid some of the hoops you have to jump through just through to get pixel shaders to do what you want.

Resources