When to bindTexture with manual mipmaps - webgl

When setting mipmap levels for a texture manually, which is the correct strategy for where to loop on each level?
note: I'm assuming gl.createTexture() definitely falls outside the loop
OPTION A
gl.bindTexture(bindTarget, texture);
gl.texParameteri();
For each mipmap level:
gl.texImage2D();
OPTION B
gl.bindTexture(bindTarget, texture);
For each mipmap level:
gl.texParameteri();
gl.texImage2D();
OPTION C
For each mipmap level:
gl.bindTexture(bindTarget, texture);
gl.texParameteri();
gl.texImage2D();

texture parameters are for the entire texture not per mip level so option a.
It kind of seems like you could derive that given that those parameters help choose whether or not mips are used at all.

Related

Is there a way to bind assets once instead of with every command encoder?

I'm rendering a vertex/frag shader with a compute kernel.
Every frame I am binding large assets (such as a 450MB texture) in the usual way:
computeEncoder.setTexture(highResTexture, index: 0)
computeEncoder.setBuffer(largeBuffer, offset: 0, index: 0)
...
renderEncoder.setVertexTexture(highResTexture, index: 0)
renderEncoder.setVertexBuffer(largeBuffer, offset: 0, index: 0)
So that is close to 1GB in bandwidth for a single texture, and I have many more assets totaling a few hundred megs, so that is about 1.5GB that I bind for every frame.
Is there anyway to bind textures/buffers to the GPU once so that they would then be available in the kernel and vertex functions without binding every frame?
I could be wrong, but I thought something was introduced in the one of the last couple WWDCs so thought I would ask to make sure I'm not missing anything.
EDIT:
By simply binding a texture in the vertex function that I have already bound in the compute encoder it does indeed show more texture bandwidth used, even though I am not using it for the capture.
GPU Read Bandwidth:
6.3920 GiB/s without binding
7.1919 GiB/s with binding
Without binding the texture:
With binding the texture but not using it in any way:
Also, if it works as you describe, why does using multiple command encoders warn about wasted bandwidth? If I use more than one emitter, each with a separate encoder, even though they bind identical resources, I get the performance warning:
I think you are confused. Setting a texture to a command encoder doesn't consume bandwidth. Reading it or sampling it inside the shader does.
When you set a texture or any other buffer to an encoder, what happens is that driver just passes some small amount of metadata to the shader using some mechanism, likely some internal buffer that's not visible to you as the API user. It doesn't "load" the texture anywhere. There's an exception for buffers that are marked as constant address buffers in the shaders, because those may get pre-fetched by the GPU for better performance.
Another thing that happens is that the resource is made resident, meaning the GPU driver will map a range of addresses in the GPU addresses virtual memory table to point to the physical memory that stores the texture contents. This also does not consume memory, but it does consume available virtual address space. You might run out of virtual address space in some cases, but that's not a bandwidth issue.
Still, if you do have a lot of textures, you might be actually spending a lot of CPU time just encoding those setTexture commands. Instead, you can use argument buffers. If the hardware you are targeting supports argument buffers tier 2, you can put every texture in an argument buffer. This will require calling useResource on all of those textures, because the driver needs to know that you are going to use those textures to make them resident, so you will still spend CPU time encoding those commands. To avoid that, you can allocate all the textures from one or more heaps and call useHeaps on those heaps. This will make the whole heap resident, and you won't need to call useResource on individual resources. There are a bunch of WWDC talks on this topic, latest one being Explore bindless rendering in Metal.
But again, to reiterate: nothing I mentioned here "wastes" bandwidth.
Update:
A very basic example of using argument buffers would be to use it like this.
let argumentDescriptor = MTLArgumentDescriptor()
argumentDescriptor.index = 0
argumentDescriptor.dataType = .texture
argumentDescriptor.textureType = .type2D
let argumentEncoder = MTLArgumentEncoder(arguments: [argumentDescriptor])
let argumentBuffer = device.makeBuffer(length: argumentEncoder.encodedLength, options: [.storageModeShared])
argumentEncoder.setArgumentBuffer(argumentBuffer, offset: 0)
argumentEncoder.setTexture(someTexture, index: 0)
commandEncoder.setBuffer(argumentBuffer, offset: 0, index: 0)
commandEncoder.useResource(someTexture, usage: .read)
And in the shader you would write a struct like this:
struct MyTexture
{
texture2d<float> texture [[ id(0) ]];
};
and then bind it like
device MyTexture& myTexture [[ buffer(0) ]]
and use it like any other struct. This is a very basic example and you can actually use reflection to create those MTLArgumentEncoders for you from functions and binding indices.

Equivalent of glColorMask in Metal for a kernel program?

I am trying to move from OpenGL to Metal for my iOS apps. In my OpenGL code I use glColorMask (if I want to write only to selected channels, for example only to alpha channel of a texture) in many places.
In Metal, for render pipeline (though vertex and fragment shader) seems like MTLColorWriteMask is the equivalent of glColorMask. I can setup it up while creating a MTLRenderPipelineState through the MTLRenderPipelineDescriptor.
But I could not find a similar option for compute pipeline (through kernel function). I always need to write all the channels (red, green, blue and alpha) every time I write to an output texture. What if I want to preserve the alpha (or any other channel) and only want to modify the color channels? I can create a copy of the output texture and use it as one of the inputs and read alpha from it to preserve the values but that is expensive.
Computer memory architectures don't like writing only some bytes of data. A write to 1 out of 4 bytes usually involves reading those four bytes into the cache, modifying one of them in the cache, and then writing those four bytes back out into memory. Well, most computers read/write a lot more than 4 bytes at a time, but you get the idea.
This happens with framebuffers too. If you do a partial write mask, the hardware is still going to be doing the equivalent of a read/modify/write on that texture. It's just not changing all of the bytes its reads.
So you can do the same thing from your compute shader. Read the 4-vector value, modify the channels you want, and then write it back out. As long as the read and write are from the same shader invocation, there should be no synchronization problems (assuming that no other invocations are trying to read/write to that same location, but if that were the case, you'd have problems anyway).

Vulkan texture rendering on multiple meshes

I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?

Filtering Interpolation method for Core Image sampler

I am using Core Image to implement my custom image processing filter. The kernel has two input sampler parameters:
kernel vec4 filterk(sampler image, sampler db)
The last sampler is a look up table and data mutation is not desired to happen.
When I am retrieving values from db sampler seems some interpolation is applied.
I have implemented this filter on Android using OpenGL shader and set filtering modes GL_NEAREST:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
With these parameters set all works correct(in contrary when GL_LINEAR is set).
Here the documentation says that default interpolation for CISampler is the bilinear interpolation and there exists also nearest neighbor.
How I can create CISampler and set interpolation method to nearest neighbor on iOS SDK?
P.S.
Also, Core Image Kernel Language documentation says that there is a __table keyword which makes sampler to used as a lookup table. But XCode reports about error when this keyword is used: unknown type name '__table'.
P.P.S.
Tried to create CIImage using initWithTexture and used texture for which filtering properties had been set to GL_NEAREST. This did not work too and again documentation says that CIImage ignores filtering and wrap modes since CISampler overrides them.
You would needs to utilize the CISampler class that is not available on iOS: Open Radar. That means specifying the interpolation is not possible at this time.
On Mac OS X (or when the class becomes available on iOS) you would specify kCISamplerFilterNearest for the kCISamplerFilterMode option key for your CISampler. See CISampler Documentation
Example:
CISampler* src = [CISampler samplerWithImage:inputImage options:
[NSDictionary dictionaryWithObjectsAndKeys:kCISamplerFilterNearest, kCISamplerFilterMode, nil]];

DirectX11: Pass data from ComputeShader to VertexShader?

Is it possible to apply a filter to the geometry data that is to be rendered using Compute Shader and then use the result as an input buffer in the Vertex Shader? That would save me the trouble (&time) of reading back the data.
Any help is much appreciated.
Yes absolutely. First you create two identicals ID3D11Buffer of structures using BIND_VERTEX_BUFFER, BIND_SHADER_RESOURCE and BIND_UNORDERED_ACCESS usage flags, and the associated UAVs and SRVs.
First step is to apply your filter to input source buffer and write to the destination buffer during your compute pass.
Then during the draw pass, you just have to bind the destination buffer to the IA stage. You can do some ping-pong if you need to accumulate computations on the vertices (I assume that by filter you mean a functional map, for refering to the Functional Programming term).

Resources