I would like to use MPSImageGaussianPyramid but am very new to Metal's usage and with mipmaps. I would like to use the filter to produce an image pyramid for image processing techniques.
From what I'm able to gather, MPSImageGaussianPyramid creates a mipmapped image, however in my code I'm having a hard time even making sure that I'm seeing the output correctly. Are there any examples where this filter has been used correctly? My questions are:
How does one access the mipmapped images after the filter has been applied?
Is it possible to copy the mipmapped images to another image for processing?
Is this mipmapped image going to be faster than manually creating a pyramid through custom filters?
Thanks, and I will provide some sample code later that I've not been able to get working.
A few pieces of advice for working with MPS kernels in general, and the image pyramid filters in particular:
If you're going to be using a kernel more than once, cache it and reuse it rather than creating a kernel every time you need to encode.
Consider setting the edgeMode property of the kernel to .clamp when downsampling, since sampling out-of-bounds (as the Gaussian pyramid does on the first step) will return black by default and introduce artificially dark pixels.
When encoding a Gaussian pyramid kernel, always use the "in-place" method, without providing a fallback allocator:
kernel.encode(commandBuffer: commandBuffer, inPlaceTexture: &myTexture)
As you note, running an image pyramid kernel puts the result in the available mip levels of the texture being downsampled. This means that the texture you provide should already have as many mip levels allocated as you want filled. Thus, you should ensure that the descriptor you use to create your texture has an appropriate mipmapLevelCount (this is ensured by the texture2DDescriptorWithPixelFormat convenience method, and can be controlled indirectly by using the .allocateMipmaps option with MTKTextureLoader).
Assuming that you now know how to encode a kernel and get the desired results into your texture, here are some answers to your questions:
1. How does one access the mipmapped images after the filter has been applied?
You can implicitly use the mipmaps in a shader when rendering by using a sampler that has has a mip filter, or you can explicitly sample from a particular mip level by passing a lod_option parameter of type level to the sample function:
constexpr sampler mySampler(coord::normalized, filter::linear, mip_filter::linear);
float4 color = myTexture.sample(mySampler, texCoords, level(selectedLod))
This works in compute kernels as well as rendering functions. Use a mip filter of nearest or round the selected LOD if you want to sample from a single mip level rather than using trilinear mip filtering.
2. Is it possible to copy the mipmapped images to another image for processing?
Since a texture that is downsampled by an image pyramid kernel must already have the .pixelFormatView usage flag, you can create a texture view on a mipped texture that selects one or more mip levels. For example, if you want to select the first and higher mip levels (dropping the base level), you could do that like this:
let textureView = myTexture.makeTextureView(pixelFormat: myTexture.pixelFormat,
textureType: myTexture.textureType,
levels: Range<Int>(uncheckedBounds: (1, myTexture.mipmapLevelCount)),
slices: Range<Int>(uncheckedBounds: (0, 1)))
You can also use a blit command encoder to copy from one texture to another, specifying which mip levels to include. This allows you to free the original texture if you want to reclaim the memory used by the lower mip levels.
You can wrap a MTLTexture with an MPSImage if you want to use APIs that work with images rather than textures:
let image = MPSImage(texture: myTexture, featureChannels: 4)
3. Is this mipmapped image going to be faster than manually creating a pyramid through custom filters?
Almost certainly. Metal Performance Shaders are tuned for each generation of devices and have numerous heuristics that optimize execution speed and energy use.
Related
I am trying to implement the MoG background subtraction algorithm based on the opencv cuda implementation
What I need is to maintain a set of gaussian parameter independently for each pixel location across multiple frame. Currently I am just allocating a single big MTLBuffer to do the job and on every frame, I have to invoke the commandEncoder.setBuffer API. Is there a better way? I read about imageblock but I am not sure if it is relevant.
Also, I would be really happy if you can spot any things that shouldn't be directly translated from cuda to metal.
Allocate an 8 bit texture and store intermediate values into the texture in your compute shader. Then after this texture is rendered, you can rebind it as an input texture to whatever other methods need to read from it in the rest of the renders. You can find a very detailed example of this sort of thing at this github example project of a parallel prefix sum on top of Metal. This example also shows how to write XCTest regression tests for your Metal shaders. Github MetalPrefixSum
My question is specifically in regards to Metal, since I don't know if the answer would change for another API.
What I believe I understand so far is this:
A mipmapped texture has precomputed "levels of detail", where lower levels of detail are created by downsampling the original texture in some meaningful way.
Mipmap levels are referred to in descending level of detail, where level 0 is the original texture, and higher-levels are power-of-two reductions of it.
Most GPUs implement trilinear filtering, which picks two neighboring mipmap levels for each sample, samples from each level using bilinear filtering, and then linearly blends those samples.
What I don't quite understand is how these mipmap levels are selected. In the documentation for the Metal standard library, I see that samples can be taken, with or without specifying an instance of a lod_options type. I would assume that this argument changes how the mipmap levels are selected, and there are apparently three kinds of lod_options for 2D textures:
bias(float value)
level(float lod)
gradient2d(float2 dPdx, float2 dPdy)
Unfortunately, the documentation doesn't bother explaining what any of these options do. I can guess that bias() biases some automatically chosen level of detail, but then what does the bias value mean? What scale does it operate on? Similarly, how is the lod of level() translated into discrete mipmap levels? And, operating under the assumption that gradient2d() uses the gradient of the texture coordinate, how does it use that gradient to select the mipmap level?
More importantly, if I omit the lod_options, how are the mipmap levels selected then? Does this differ depending on the type of function being executed?
And, if the default no-lod-options-specified operation of the sample() function is to do something like gradient2D() (at least in a fragment shader), does it utilize simple screen-space derivatives, or does it work directly with rasterizer and interpolated texture coordinates to calculate a precise gradient?
And finally, how consistent is any of this behavior from device to device? An old article (old as in DirectX 9) I read referred to complex device-specific mipmap selection, but I don't know if mipmap selection is better-defined on newer architectures.
This is a relatively big subject that you might be better off asking on https://computergraphics.stackexchange.com/ but, very briefly, Lance Williams' paper "Pyramidal Parametrics" that introduced trilinear filtering and the term "MIP mapping", has a suggestion that came from Paul Heckbert (see page three, 1st column) that I think may still be used, to an extent, in some systems.
In effect the approaches to computing the MIP map levels are usually based on the assumption that your screen pixel is a small circle and this circle can be mapped back onto the texture to get, approximately, an ellipse. You estimate the length of the longer axis expressed in texels of the highest resolution map. This then tells you which MIP maps you should sample. For example, if the length was 6, i.e. between 2^2 and 2^3, you'd want to blend between MIP map levels 2 and 3.
Is it possible to process an MTLTexture in-place without osx_ReadWriteTextureTier2?
It seems like I can set two texture arguments to be the same texture. Is this supported behavior?
Specifically, I don't mind not having texture caching update after a write. I just want to in-place (and sparsely) modify a 3d texture. It's memory prohibitive to have two textures. And it's computationally expensive to copy the entire texture, especially when I might only be updating a small portion of it.
Per the documentation, regardless of feature availability, it is invalid to declare two separate texture arguments (one read, one write) in a function signature and then set the same texture for both.
Any Mac that supports osx_GPUFamily1_v2 supports function texture read-writes (by declaring the texture with access::read_write).
The distinction between "Tier 1" (which has no explicit constant) and osx_ReadWriteTextureTier2 is that the latter supports additional pixel formats for read-write textures.
If you determine that your target Macs don't support the kind of texture read-writes you need (because you need to deploy to OS X 10.11 or because you're using an incompatible pixel format for the tier of machine you're deploying to), you could operate on your texture one plane at a time, reading from your 3D texture, writing to a 2D texture, and then blitting the result back into the corresponding region in your 3D texture. It's more work, but it'll use much less than double the memory.
I'm trying to implement fluid dynamics using compute shaders. In the article there are a series of passes done on a texture since this was written before compute shaders.
Would it be faster to do each pass on a texture or buffer? The final pass would have to be applied to a texture anyways.
I would recommend using whichever dimensionality of resource fits the simulation. If it's a 1D simulation, use a RWBuffer, if it's a 2D simulation use a RWTexture2D and if it's a 3D simulation use a RWTexture3D.
There appear to be stages in the algorithm that you linked that make use of bilinear filtering. If you restrict yourself to using a Buffer you'll have to issue 4 or 8 memory fetches (depending on 2D or 3D) and then more instructions to calculate the weighted average. Take advantage of the hardware's ability to do this for you where possible.
Another thing to be aware of is that data in textures is not laid out row by row (linearly) as you might expect, instead it's laid in such a way that neighbouring texels are as close to one another in memory as possible; this can be called Tiling or Swizzling depending on whose documentation you read. For that reason, unless your simulation is one-dimensional, you may well get far better cache coherency on reads/writes from a resource whose layout most closely matches the dimensions of the simulation.
I'm writing a pixel shader that has the property where for a given quad the values returned only vary by u-axis value. I.e. for a fixed u, then the color output is constant as v varies.
The computation to calculate the color at a pixel is relatively expensive - i.e. does multiple samples per pixel / loops etc..
Is there a way to take advantage of the v-invariance property? If I was doing this on a CPU then you'd obviously just cache the values once calculated but guess that doesn't apply because of parallelism. It might be possible for me to move the texture generation to the CPU side and have the shader access a Texture1D but I'm not sure how fast that will be.
Is there a paradigm that fits this situation on GPU cards?
cheers
Storing your data in a 1D texture and sampling it in your pixel shader looks like a good solution. Your GPU will be able to use texture caching features, allowing it to make use of the fact that many of you pixels are using the same value from your 1D texture. This should be really fast, texture fetching and caching is one of the main reasons your gpu is so efficient at rendering.
It is commonpractice to make a trade-off between calculating the value in the pixel shader, or using a lookup table texture. You are doing complex calculations by the sound of it, so using a lookup texture with certainly improve performance.
Note that you could still generate this texture by the GPU, there is no need to move it to the CPU. Just render to this 1D texture using your existing shader code as a prepass.