How many times is the vertex shader called with metal? - ios

I've been learning some basic metal rendering, and I am stuck with some basic concepts:
I know we can send vertex data to a shader using:
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
And then we can retrieve it in the shader with:
vertex float4 basic_vertex(const device VertexIn* vertexIn [[ buffer(0) ]], unsigned int vid [[ vertex_id ]])
As I understand it, the vertex function will be called once per each vertex, and vertex_id will update on each call to contain the vertex index.
The question is, from where comes that vertex_id?
I could send to the shader more data with different sizes:
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderEncoder.setVertexBuffer(vertexBuffer2, offset: 0, index: 1)
If vertexBuffer has 3 elements , and vertexBuffer2 has 10 elements ...how many times are the vertex function called? 10?
Thanks!

That's determined by the draw call you make on the render command encoder. Take the simplest draw method:
drawPrimitives(type:vertexStart:vertexCount:)
The vertexCount determines how many times your vertex function is called. The vertex IDs passed to the vertex function are those in the range from vertexStart to vertexStart + vertexCount - 1.
If you consider another draw method:
drawPrimitives(type:vertexStart:vertexCount:instanceCount:)
That goes over the same range of vertex IDs. However, it calls your vertex function vertexCount * instanceCount times. There will be instanceCount calls with the vertex ID being vertexStart. The instance ID will range from 0 to instanceCount - 1 for those calls. Likewise, there will be instanceCount calls with the vertex ID being vertexStart + 1 (assuming vertexCount >= 2), one with each instance ID in [0..instanceCount-1]. Etc.
The other draw methods have various other options, but they mostly don't affect how many times the vertex function is called. For example, baseInstance shifts the range of the instance IDs, but not its size.
The various drawIndexedPrimitives() methods get the specific vertex IDs from a buffer instead of enumerating all vertex IDs in a range. That buffer may contain a given vertex ID in multiple locations. For that case, I don't think it's defined whether the vertex function might be called multiple times for the same vertex ID and instance ID. Metal will presumably try to avoid duplicating effort, but it might end up actually being faster to just call the vertex function for every index in the index buffer even if multiple such indexes end up being the same vertex ID.
The relationship between vertexes and data in buffers you pass to the vertex processing stage is entirely up to you. You don't have to pass any buffers, at all. For example, a vertex function could generate vertex information completely computationally just from the vertex ID and instance ID.
It is pretty common, of course, for at least some of the buffers to contain arrays of per-vertex data that are indexed into using the vertex ID. Other buffers might be uniform data that's the same for all vertexes (that is, you don't index into that buffer using the vertex ID). Metal itself doesn't know this, though.

Related

Running compute kernel on portion of MTLBuffer?

I am populating an MTLBuffer with float2 vectors. The buffer is being created and populated like this:
struct Particle {
var position: float2
...
}
let particleCount = 100000
let bufferSize = MemoryLayout<Particle>.stride * particleCount
particleBuffer = device.makeBuffer(length: bufferSize)!
var pointer = particleBuffer.contents().bindMemory(to: Particle.self, capacity: particleCount)
pointer = pointer.advanced(by: currentParticles)
pointer.pointee.position = [x, y]
In my Metal file the buffer is being accessed like this:
struct Particle {
float2 position;
...
};
kernel void compute(device Particle *particles [[buffer(0)]],
uint id [[thread_position_in_grid]] … )
I need to be able to compute a given range of the MTLBuffer. For example, is it possible to run the compute kernel on say starting from the 50,000 value and ending at 75,000 value?
It seems like the offset parameter allows me to specify the start position, but it does not have a length parameter.
I see there is this call:
setBuffers(_:offsets:range:)
Does the range specify what portion of the buffer to run? It seems like the range specifies what buffers are used and not the range of values to use.
A compute shader doesn't run "on" a buffer (or portion). It runs on a grid, which is an abstract concept. As far as Metal is concerned, the grid isn't related to a buffer or anything else.
A buffer may be an input that your compute shader uses, but how it uses it is up to you. Metal doesn't know or care.
Here's my answer to a similar question.
So, the dispatch command you encode using a compute command encoder governs how many times your shader is invoked. It also dictates what thread_position_in_grid (and related) values each invocation receives. If your shader correlates each grid position to an element of an array backed by a buffer, then the number of threads you specify in your dispatch command governs how much of the buffer you end up accessing. (Again, this is not something Metal dictates; it's implicit in the way you code your shader.)
Now, to start at the 50,000th element, using an offset on the buffer binding to make that the effective start of the buffer from the point of view of the shader is a good approach. But it would also work to just add 50,000 to the index the shader computes when it accesses the buffer. And, if you only want to process 25,000 elements (75,000 minus 50,000), then just dispatch 25,000 threads.

VBO and EBO state in WebGL

In WebGL to draw something using an index buffer you would somewhat undergo this routine (as hinted by MDN):
setup:
bindBuffer(ARRAY_BUFFER);
bufferData(pass vertex data);
bindBuffer(ELEMENT_ARRAY_BUFFER);
bufferData(pass index data);
drawing:
bindBuffer(ELEMENT_ARRAY_BUFFER);
glDrawElements(...);
There is no bindBuffer(ARRAY_BUFFER) call.
Suppose I have multiple VBO's with vertex data. How does an EBO will know from which buffer to take the data?
In standard OpenGL I would encapsulate it in VAO. But lack thereof in WebGL is what confuses me.
Without VAOs your typical path is this
setup:
create programs and lookup attribute and uniform locations
create buffers
create texture
drawing:
for each model
for each attribute
bindBuffer(ARRAY_BUFFER, model's buffer for attribute)
vertexAttribPointer(...)
bindBuffer(ELEMENT_ARRAY_BUFFER, model's ebo)
set uniforms and bind textures
glDrawElements
With VAOs this changes to
setup:
create programs and lookup attribute and uniform locations
create buffers
create texture
for each model
create and bind VAO
for each attribute
bindBuffer(ARRAY_BUFFER, model's buffer for attribute)
vertexAttribPointer(...)
bindBuffer(ELEMENT_ARRAY_BUFFER, model's ebo)
drawing:
for each model
bindVertexArray(model's VAO)
set uniforms and bind textures
glDrawElements
BTW: WebGL 1 has VAOs as an extension which is available on most devices and there's a polyfill you can use to just make it look like it's everywhere so if you're used to using VAOs I'd suggest using the polyfill.
How does an EBO will know from which buffer to take the data?
EBO's don't take data from buffers they just specify indices. Attributes take data from buffers. Attributes record the current ARRAY_BUFFER binding when you call vertexAttribPointer. In other words
gl.bindBuffer(ARRAY_BUFFER, bufferA);
gl.vertexAttribPointer(positionLocation, ...);
gl.bindBuffer(ARRAY_BUFFER, bufferB);
gl.vertexAttribPointer(normalLocation, ...);
gl.bindBuffer(ARRAY_BUFFER, bufferC);
gl.vertexAttribPointer(texcoordLocation, ...);
In this case positions will come from bufferA, normals from bufferB, texcoords from bufferC. That's the same with or without VAOs. The difference between VAOs and no VAOs is whether attribute state (and EBO binding) are global or per VAO.

Multiple models in Metal. How?

This is an absolute beginner question.
Background: I’m not really a game developer, but I’m trying to learn the basics of low-level 3D programming, because it’s a fun and interesting topic. I’ve picked Apple’s Metal as the graphics framework. I know about SceneKit and other higher level frameworks, but I’m intentionally trying to learn the low level bits. Unfortunately I’m way out of my depth, and there seems to be very little beginner-oriented Metal resources on the web.
By reading the Apple documentation and following the tutorials I could find, I’ve managed to implement a simple vertex shader and a fragment shader and draw an actual 3D model on the screen. Now I’m trying to draw a second a model, but I’m kind of stuck, because I’m not really sure what’s really the best way to go about it.
Do I…
Use a single vertex buffer and index buffer for all of my models, and tell the MTLRenderCommandEncoder the offsets when rendering the individual models?
Have a separate vertex buffer / index buffer for each model? Would such an approach scale?
Something else?
TL;DR: What is the recommended way to store the vertex data of multiple models in Metal (or any other 3D framework)?
There is no one recommended way. When you're working at such a low level as Metal, there are many possibilities, and the one you pick depends heavily on the situation and what performance characteristics you want/need to optimize for. If you're just playing around with intro projects, most of these decisions are irrelevant, because the performance issues won't bite until you scale up to a "real" project.
Typically, game engines use one buffer (or set of vertex/index buffers) per model, especially if each model requires different render states (e.g. shaders, bound textures). This means that when new models are introduced to the scene or old ones no longer needed, the requisite resources can be loaded into / removed from GPU memory (by way of creating / destroying MTL objects).
The main use case for doing multiple draws out of (different parts of) the same buffer is when you're mutating the buffer. For example, on frame n you're using the first 1KB of a buffer to draw with, while at the same time you're computing / streaming in new vertex data and writing it to the second 1KB of the buffer... then for frame n + 1 you switch which parts of the buffer are being used for what.
To add a bit to rickster's answer, I would encapsulate your model in a class that contains one buffer (or two, if you count the index buffer) per model, and pass an optional parameter with the number of instances of that model you want to create.
Then, keep an additional buffer where you store whatever variations you want to introduce per instance. Usually, it's just the transform and a different material. For instance,
struct PerInstanceUniforms {
var transform : Transform
var material : Material
}
In my case, the material contains a UV transform, but the texture has to be the same for all the instances.
Then your model class would look something like this,
class Model {
fileprivate var indexBuffer : MTLBuffer!
fileprivate var vertexBuffer : MTLBuffer!
var perInstanceUniforms : [PerInstanceUniforms]
let uniformBuffer : MTLBuffer!
// ... constructors, etc.
func draw(_ encoder: MTLRenderCommandEncoder) {
encoder.setVertexBuffer(vertexBuffer, offset: 0, at: 0)
RenderManager.sharedInstance.setUniformBuffer(encoder, atIndex: 1)
encoder.setVertexBuffer(self.uniformBuffer, offset: 0, at: 2)
encoder.drawIndexedPrimitives(type: .triangle, indexCount: numIndices, indexType: .uint16, indexBuffer: indexBuffer, indexBufferOffset: 0, instanceCount: self.numInstances)
}
// this gets called when we need to update the buffers used by the GPU
func updateBuffers(_ syncBufferIndex: Int) {
let uniformB = uniformBuffer.contents()
let uniformData = uniformB.advanced(by: MemoryLayout<PerInstanceUniforms>.size * perInstanceUniforms.count * syncBufferIndex).assumingMemoryBound(to: Float.self)
memcpy(uniformData, &perInstanceUniforms, MemoryLayout<PerInstanceUniforms>.size * perInstanceUniforms.count)
}
}
Your vertex shader with instances will look something like this,
vertex VertexInOut passGeometry(uint vid [[ vertex_id ]],
uint iid [[ instance_id ]],
constant TexturedVertex* vdata [[ buffer(0) ]],
constant Uniforms& uniforms [[ buffer(1) ]],
constant Transform* perInstanceUniforms [[ buffer(2) ]])
{
VertexInOut outVertex;
Transform t = perInstanceUniforms[iid];
float4x4 m = uniforms.projectionMatrix * uniforms.viewMatrix;
TexturedVertex v = vdata[vid];
outVertex.position = m * float4(t * v.position, 1.0);
outVertex.uv = float2(0,0);
outVertex.color = float4(0.5 * v.normal + 0.5, 1);
return outVertex;
}
Here's an example I wrote of using instancing, with a performance analysis: http://tech.metail.com/performance-quaternions-gpu/
You can find the full code for reference here: https://github.com/endavid/VidEngine
I hope that helps.

Random access to D3D11 buffer with R8G8B8A8_UNorm format in HLSL

I have a D3D11 buffer with a few million elements that is supposed to hold data in the R8G8B8A8_UNorm format.
The desired behavior is the following: One shader calculates a vec4 and writes it to the buffer in a random access pattern. In the next pass, another shader reads the data in a random access pattern and processes them further.
My best guess would be to create an UnorderedAccessView with the R8G8B8A8_UNorm format. But how do I declare the RWBuffer<?> in HLSL, and how do I write to and read from it? Is it necessary to declare it as RWBuffer<uint> and do the packing from vec4 to uint manually?
In OpenGL I would create a buffer and a buffer texture. Then I can declare an imageBuffer with the rgba8 format in the shader, access it with imageLoad and imageStore, and the hardware does all the conversions for me. Is this possible in D3D11?
This is a little tricky due to a lot of different gotchas, but you should be able to do something like this.
In your shader that writes to the buffer declare:
RWBuffer<float4> WriteBuf : register( u1 );
Note that it is bound to register u1 instead of u0. Unordered access views (UAV) must start at slot 1 because the u# register is also used for render targets.
To write to the buffer just do something like:
WriteBuf[0] = float4(0.5, 0.5, 0, 1);
Note that you must write all 4 values at once.
In your C++ code, you must create an unordered access buffer, and bind it to a UAV. You can use the DXGI_FORMAT_R8G8B8A8_UNORM format. When you write 4 floats to it, the values will automatically be converted and packed. The UAV can be bound to the pipeline using OMSetRenderTargetsAndUnorderedAccessViews.
In your shader that reads from the buffer declare a read only buffer:
Buffer<float4> ReadBuf : register( t0 );
Note that this buffer uses t0 because it will be bound as a shader resource view (SRV) instead of UAV.
To access the buffer use something like:
float4 val = ReadBuf[0];
In your C++ code, you can bind the same buffer you created earlier to an SRV instead of a UAV. The SRV can be bound to the pipeline using PSSetShaderResources and can also be created with DXGI_FORMAT_R8G8B8A8_UNORM.
You can't bind both the SRV and UAV using the same buffer to the pipeline at the same time. So you must bind the UAV first and run your first shader pass. Then unbind the UAV, bind SRV, and run the second shader pass.
There are probably other ways to do this as well. Note that all of this requires shader model 5.

using setvertexdeclaration with fixed-function pipeline in directx 9

I am trying to use my own vertex structure, upload the vertices into a vertex buffer (indices into index buffer, without FVF code), set up the vertex declaration and stream source and use, and draw them using DrawIndexedPrimitive with fixed shader (but not FVF).
Do I have to write my own shader to use directx 9 SetVertexDeclaration ?
Can I use a customised vertex structure with SetVertexDeclaration and fixed-pipeline ?
If I can ,is there any restriction on fixed-pipeline and vertex declaration ?
Customised vertex structure:
struct PosNormTexCoord
{
float x,y,z;
float nx,ny,nz;
float tu,tv;
};
Unfortunately, you can't use fixed pipeline with custom vertex format. But your structure can be expressed in FVF, why would you want to skip its usage?

Resources