When to release a Vertex Array Object?

When to release a Vertex Array Object? - webgl

What are the guidelines for releasing a Vertex Array Object, e.g. binding to null?
Anecdotally it seems that I can have similar shaders and I only need to release after some grouping... or is the best-practice to do it after every grouped shader?
I guess there's another possibility of doing it after every draw call even if they are batched by shader but I don't think that's necessary...

It's not clear what you're asking. "When to release a texture". When you're done using it? I think you mean "unbind" not "release". "release" in most programming means to delete from memory or to at least allow to be deleted from memory.
Assuming you mean when to unbind a Vertex Array Object (VAO) the truth is you never have to unbind a VAO.
As explained else where VAOs contain all attribute state AND the ELEMENT_ARRAY_BUFFER binding so
currentVAO = {
elementArrayBuffer: someIndexBuffer,
attributes: [
{ enabled: true, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: someBuffer, },
{ enabled: true, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: someBuffer, },
{ enabled: true, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: someBuffer, },
{ enabled: false, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: null, },
{ enabled: false, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: null, },
{ enabled: false, size: 3, type: gl.FLOAT, stride: 0, offset: 0, buffer: null, },
...
... up to MAX_VERTEX_ATTRIBS ...
]
};
As long as you remember that gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, someBuffer) effects state inside the current VAO, not global state like gl.bindBuffer(gl.ARRAY_BUFFER).
I think that's the most confusing part. Most WebGL methods make it clearer what's being affected gl.bufferXXX affects buffers, gl.texXXX effects textures. gl.renderbufferXXX renderbuffers, gl.framebufferXX framebuffers, gl.vertexXXX effects vertex attributes (the VAO). etc.. But gl.bindBuffer is different at least in this case, it affects global state when binding to ARRAY_BUFFER but it affects VAO state when binding to ELEMENT_ARRAY_BUFFER.
my suggestion would be during initialization follow these steps in this order
for each object
1. create VAO
2. create vertex buffers and fill with data
3. setup all attributes
4. create index buffers (ELEMENT_ARRAY_BUFFER) and fill with data
At render time
for each object
1. use program (if program is different)
2. bind VAO for object (if different)
3. set uniforms and bind textures
4. draw
What's important to remember is that if you ever call gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, ...) your affecting the current VAO.
Why might I want to bind a null VAO? Mostly because I often forget the paragraph above because before VAOs ELEMENT_ARRAY_BUFFER was global state. So, when I forget that and I randomly bind some ELEMENT_ARRAY_BUFFER so that I can put some indices in it I've just changed the ELEMENT_ARRAY_BUFFER binding in the current VAO. Probably not something I wanted to do. By binding null, say after initializing all my objects and after my render loop, then I'm less likely cause that bug.
Also note that if I do want to update the indices of some geometry, meaning I want to call gl.bufferData or gl.bufferSubData I can sure I'm affecting the correct buffer in one of 2 ways. One by binding that buffer to ELEMENT_ARRAY_BUFFER and then calling gl.bufferData. The other by binding the appropriate VAO.
If that didn't make sense then assume I had 3 VAOs
// pseudo code
forEach([sphere, cube, torus])
create vao
create buffers and fill with data
create indices (ELEMENT_ARRAY_BUFFER)
fill out attributes
Now that I have 3 shapes lets say I wanted to change the indices in the sphere. I could do this 2 ways
One, I could bind the sphere's ELEMENT_ARRAY_BUFFER directly
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, sphereElementArrayBuffer)
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER...); // update the indices
This has the issue that if some other VAO is bound I just changed it's ELEMENT_ARRAY_BUFFER binding
Two, I could just bind the sphere's VAO since it's already got the ELEMENT_ARRAY_BUFFER bound
gl.bindVertexArray(sphereVAO);
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, ...); // update the indices
This seems safer IMO.
To reiterate, ELEMENT_ARRAY_BUFFER binding is part of VAO state.

Related

How to bind a variable number of textures to Metal shader?

On the CPU I'm gathering an array of MTLTexture objects that I want to send to the fragment shader. There can be any number of these textures at any given moment. How can I send a variable-length array of MTLTextures to a fragment shader?
Example.)
CPU:
var txrs: [MTLTexture] = []
for ... {
txrs.append(...)
}
// Send array of textures to fragment shader.
GPU:
fragment half4 my_fragment(Vertex v [[stage_in]], <array of textures>, ...) {
...
for(int i = 0; i < num_textures; i++) {
texture2d<half> txr = array_of_textures[i];
}
...
}

The array other person suggested won't work, because textures will take up all the bind points up to 31, at which point it will run out.
Instead, you need to use argument buffers.
So, for this to work, you need a tier 2 argument buffer support. You can check it with argumentBuffersSupport property on an MTLDevice.
You can read more about argument buffers here or watch this talk about bindless rendering.
The basic idea is to use MTLArgumentEncoder to encode textures you need in argument buffers. Unfortunately, I don't think there's a direct way to just encode a bunch of MTLTextures, so instead, you would create a struct in your shaders like this
struct SingleTexture
{
texture2d<half> texture;
};
The texture in this struct has an implicit id of 0. To learn more about id, read Argument Buffers section in the spec, but it's basically a unique index for each entry in the ab.
Then, change your function signature to
fragment half4 my_fragment(Vertex v [[stage_in]], device ushort& textureCount [[ buffer(0), device SingleTexture* textures [[ buffer(1) ]])
You will then need to bind the count (use uint16_t instead of uint32_t in most cases). Just as a 2 (or 4) byte buffer. (You can use set<...>Bytes function on an encoder for that).
Then, you will need to compile that function to MTLFunction and from it, you can create a MTLArgumentEncoder using newArgumentEncoderForBufferAtIndex method. You will use buffer index 1 in this case, because that's where your AB is bound in the function.
From MTLArgumentEncoder you can get encodedLength, which is basically a size for one SingleTexture struct in AB. After you get that, multiply it by number of textures to get a buffer of a proper size to encode your argument buffer to.
After that, in your setup code, you can just do this
for(size_t i = 0; i < textureCount; i++)
{
// We basically just offset into an array of SignlaTexture
[argumentEncoder setArgumentBuffer:<your buffer you just created> offset:argumentEncoder.encodedLength * i];
[argumentEncoder setTexture:textures[i] atIndex:0];
}
And then, when you are done encoding the buffer, you can hold on to it, until your texture array changes (you don't need to reencode it every frame).
Then, you need to bind the argument buffer to buffer binding point 1, just as you would bind any other buffer.
Last thing you need to do is to make sure all the resources referenced indirectly are resident on the GPU. Since you encoded your textures into AB, driver has no way to know whether you used them or not, because you are not binding them directly.
To do that, use useResource or useResources variation on an encoder you are using, kinda like this:
[encoder useResources:&textures[0] count:textureCount usage:MTLResourceUsageRead];
This is kinda a mouthful, but this is the proper way to bind anything you want to your shaders.

In Metal, can you reuse buffer argument table indexes during a pass?

I see example code in which different buffers are put at the same index during a single render pass. Like this:
renderEncoder.setVertexBuffer(firstBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vcount1)
renderEncoder.setVertexBuffer(secondBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .point, vertexStart: 0, vertexCount: vcount2)
The index parameter is an index into the "buffer argument table", which has 32 entries, so the legal values are 0 to 31.
But I also see documentation that says you can't change the contents of a buffer until after the GPU completes its work on the given render pass.
So, is the above code legal and not prone to any timing issues?
If so, I guess that means the limit of 32 is a limit on how many buffers you can use in a single draw call, not how many buffers you can use in a single pass, aka MTLCommandBuffer. Correct?

You can't change the contents of the buffers themselves, meaning the MTLBuffer objects. What you can change is which buffers are bound. When you call setVertexBuffer, command encoder remembers which buffer you bound there until you bind nil or another buffer and every time you issue a draw command (like drawPrimitives, or a dispatch command (like dispatchThreadgroups) the current bindings are "saved" and you can go ahead and encode new buffers (and also textures).

Weird case with MemoryLayout using a struct protocol, different size reported

I'm working on a drawing engine using Metal. I am reworking from a previous version, so starting from scratch
I was getting error Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
After some debugging I placed the blame to my drawPrimitives routine, I found the case quite interesting
I will have a variety of brushes, all of them will work with specific Vertex info
So I said, why not? Have all the brushes respond to a protocol
The protocol for the Vertices will be this:
protocol MetalVertice {}
And the Vertex info used by this specific brush will be:
struct PointVertex:MetalVertice{
var pointId:UInt32
let relativePosition:UInt32
}
The brush can be called either by giving Vertices previously created or by calling a function to create those vertices. Anyway, the real drawing happens at the vertice function
var vertices:[PointVertex] = [PointVertex].init(repeating: PointVertex(pointId: 0,
relativePosition: 0),
count: totalVertices)
for (verticeIdx, pointIndex) in pointsIndices.enumerated(){
vertices[verticeIdx].pointId = UInt32(pointIndex)
}
for vertice in vertices{
print("size: \(MemoryLayout.size(ofValue: vertice))")
}
self.renderVertices(vertices: vertices,
forStroke: stroke,
inDrawing: drawing,
commandEncoder: commandEncoder)
return vertices
}
func renderVertices(vertices: [MetalVertice], forStroke stroke: LFStroke, inDrawing drawing:LFDrawing, commandEncoder: MTLRenderCommandEncoder) {
if vertices.count > 1{
print("vertices a escribir: \(vertices.count)")
print("stride: \(MemoryLayout<PointVertex>.stride)")
print("size of array \(MemoryLayout.size(ofValue: vertices))")
for vertice in vertices{
print("ispointvertex: \(vertice is PointVertex)")
print("size: \(MemoryLayout.size(ofValue: vertice))")
}
}
let vertexBuffer = LFDrawing.device.makeBuffer(bytes: vertices,
length: MemoryLayout<PointVertex>.stride * vertices.count,
options: [])
This was the issue, calling this specific code produces these results in the console:
size: 8
size: 8
vertices a escribir: 2
stride: 8
size of array 8
ispointvertex: true
size: 40
ispointvertex: true
size: 40
In the previous function, the size of the vertices is 8 bytes, but for some reason, when they enter the next function they turn into 40 bytes, so the buffer is incorrectly constructed
if I change the function signature to:
func renderVertices(vertices: [PointVertex], forStroke stroke: LFStroke, inDrawing drawing:LFDrawing, commandEncoder: MTLRenderCommandEncoder) {
The vertices are correctly reported as 8 bytes long and the draw routine works as intended
Anything I'm missing? if the MetalVertice protocol introducing some noise?

In order to fulfill the requirement that value types conforming to protocols be able to perform dynamic dispatch (and also in part to ensure that containers of protocol types are able to assume that all of their elements are of uniform size), Swift uses what are called existential containers to hold the data of protocol-conforming value types alongside metadata that points to the concrete implementations of each protocol. If you've heard the term protocol witness table, that's what's getting in your way here.
The particulars of this are beyond the scope of this answer, but you can check out this video and this post for more info.
The moral of the story is: don't assume that Swift will lay out out your structs as-written. Swift can reorder struct members and add padding or arbitrary metadata, and it gives you practically no control over this. Instead, declare the structs you need to use in your Metal code in a C or Objective-C file and import them via a bridging header. If you want to use protocols to make it easier to address your structs polymorphically, you need to be prepared to copy them member-wise into your regular old C structs and prepared to pay the memory cost that that convenience entails.

How data layout in RAM memory?

I have a basic architecture based question. How does multi dimensional arrays layout in memory? Is this correct that data layout linearly in memory? Is so, is it correct that in row major order data store based on row orders (first row store, then second row ...) and in column major data stores based on columns?
Thanks

The representation of an array depends upon the programming language. Most languages (the C abortion and its progeny being notable exceptions) represent arrays using a descriptor. The descriptor specifies the number of dimensions the upper and lower bounds of each dimension, and where the data is located.
Usually, the all the data for the array is stored contiguously. Even when stored contiguously the ordering depends upon the language. In some languages [0, 0, 0] is stored next to [1, 0, 0] (Column Major—e.g. FORTRAN)). In others [0, 0, 0] is next to [0, 0, 1] (and [0, 0, 0] and [1, 0, 0] are apart—row major—e.g., Pascal). Some languages, such as Ada, leave the ordering up to the compiler implementation.

Each array is stored in sequence, naturally. It makes no sense to spread data all over the place.
Example in C:
int matrix[10][10];
matrix[9][1] = 1234;
printf("%d\n", matrix[9][1]); // prints 1234
printf("%d\n", ((int*)matrix)[9 * 10 + 1]); // prints 1234
Of course there is nothing enforcing you to organize data this way, if you want to make a mess you can do it.
For example, if instead of using an array of arrays you decide to dynamically allocate your matrix:
int **matrix;
matrix = malloc(10 * sizeof(int*));
for (int i = 0; i < 10; ++i)
matrix[i] = malloc(10 * sizeof(int));
The above example is most likely still stored in sequence, but certainly not in a contiguous manner, because there are 11 different memory blocks allocated and the memory manager is free to allocate them wherever it makes sense to it.

Wrong semaphor in case of opencl usage

Solution:
Finally I could solve or at least to find a good workaround for my problem.
This kind of semaphore doesn't work in case of NVIDIA.
I think this comment is right.
So I decided to use atomic_add() which is mandatory part of the OpenCL 1.1.
I have a resultBuffer array and resultBufferSize global variable and the last one is set to zero.
When I have results (my result is always!! x numbers) than I simple call
position = atomic_add(resultBufferSize, x);
and I can be sure no one writes between position and position + x into the buffer.
Don't forget the global variable must be volatile.
When the threads run into endless loops the resource is not available and therefore the -5 error code during the buffer reading.
Update:
When I read back:
oclErr |= clEnqueueReadBuffer(cqCommandQueue, cm_inputNodesArraySizes, CL_TRUE, 0, lastMapCounter*sizeof(cl_uint), (void*)&inputNodesArraySizes, 0, NULL, NULL);
The value of the lastMapCounter changes. It's strange because in the ocl code I do nothing and I take care of sizes: what I wrote into the buffer creation and what I copy I read the same back. And a hidden bufferoverflow can cause many stange things indeed.
End of update
I did the following code and there is a bug in it. I want a semaphore to change the resultBufferSize global variable (now I just want to try it how it works) and get back a big number (it is supposed that each worker write something). But I get always 3 or sometimes errors. There is no logic how the compiler works.
__kernel void findCircles(__global uint *inputNodesArray, __global
uint*inputNodesArraySizes, uint lastMapCounter,
__global uint *resultBuffer,
__global uint *resultBufferSize, volatile __global uint *sem)
{
for(;atom_xchg(sem, 1) > 0;)
(*resultBufferSize) = (*resultBufferSize) + 3;
atom_xchg(sem, 0);
}
I got -48 during the kernel execution and sometimes it's OK and I got -5 when I want to read back the buffer (the size buffer).
Do you have any idea where I can find the bug?
NVIDIA opencl 1.1 which is used.
Of course on the host I configure everything well:
uint32 resultBufferSize = 0;
uint32 sem;
cl_mem cmresultBufferSize = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE,
sizeof(uint32), NULL, &ciErrNum);
cl_mem cmsem = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE, sizeof(uint32), NULL,
&ciErrNum);
ciErrNum = clSetKernelArg(ckKernel, 4, sizeof(cl_mem), (void*)&cmresultBufferSize);
ciErrNum = clSetKernelArg(ckKernel, 5, sizeof(cl_mem), (void*)&cmsem);
ciErrNum |= clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL,
&szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
ciErrNum = clEnqueueReadBuffer(cqCommandQueue, cmresultBufferSize, CL_TRUE, 0,
sizeof(uint32), (void*)&resultBufferSize, 0, NULL, NULL);
(in case of this code the kernel is OK and the last reading is return -5)

I know you have come to a conclusion on this, but I want to point out two things:
1) The semaphore is non-portable because it isn't SIMD safe, as pointed out in the linked thread.
2) The memory model is not strong enough to give a meaning to the code. The update of the result buffer could move out of the critical section - nothing in the model says otherwise. At the very least you'd need fences, but the language around fences in the 1.x specs is also fairly weak. You'd need an OpenCL 2.0 implementation to be confident that this aspect is safe.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart